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Ginkgo  biloba  is  a  taxonomically  isolated  seed  plant 
frequently  placed  in  an  evolutionary  grade  group  referred 
to  as  gymnosperms.  Precise  phylogenetic  relationships 
between  Ginkgo  and  other  gymnosperm  groups  remain 
uncertain.  Small  subunit  ribosomal  DNA  sequences  were 
isolated  from  a  Ginkgo  genomic  library.  One  group  of 
clones  represents  the  small  subunit  rRNA  genes  from  the 
major  ribosomal  repeat  of  Ginkgo  and  are  present  in 
approximately  16,000  copies  in  the  diploid  Ginkgo  genome. 
A  representative,  Gbr-1000,  from  this  group  was  subcloned 
and  sequenced.  A  representative,  Gbr-6700,  from  a  second 
group  of  clones  was  also  sequenced  and  contains  a  ss  rRNA- 


like  sequence  that  is  interrupted  by  a  1.1  kb  insert  700  bp 
into  the  ss  rRNA-like  region.  These  ss  rRNA-like  sequences 
are  present  in  approximately  3,400  copies  in  the  Ginkgo 
genome.  The  Gbr-1000  sequence  was  compared  with  homologous 
sequences  from  other  green  plant  taxa.  Phylogenetic 
relationships  of  these  taxa  were  inferred  using  a  cladistic 
parsimony  analysis  of  the  combined  sequence  data.  A  single 
most  parsimonious  tree  was  found  which  supports  a 
monophyletic  grouping  of  Ginkgo  and  the  cycad  Zamia  puroila. 
The  Gbr-6700  sequence  was  compared  with  that  of  Gbr-1000. 
The  homologous  regions  of  the  two  Ginkgo  sequences  are  86% 
similar  compared  with  92%-97%  sequence  similarities 
observed  between  ss  rRNA  sequences  of  available  seed  plant 
taxa.  The  Gbr-6700  sequence  represents  a  ss  rRNA 
pseudogene  which  is  present  in  multiple  copies  in  the 
Ginkgo  genome.  These  sequences  appear  to  be  undergoing 
concerted  evolution  within  the  group,  similar  to  that 
observed  for  eukaryotic  rRNA  gene  families. 
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CHAPTER  1 
INTRODUCTION 


Ginkgo  biloba  is  a  taxonomically  isolated  seed  plant. 
It  is  frequently  placed  in  an  evolutionary  grade  group 
referred  to  as  gymnosperms  along  with  cycads,  conifers  and 
gnetopsids.  All  eukaryotic  green  plants  have  historically 
been  placed  in  the  kingdom  Plantae  (Foster  and  Gifford 
1974).  More  recently,  based  on  cladistic  studies  a 
grouping  at  the  subkingdom  level,  the  Chlorobionta  has  been 
proposed  (Bremer  1985) .  In  the  classification  proposed  by 
Bremer,  Ginkgo  is  placed  in  the  subclass  Ginkgoidae  within 
the  class  Spermatopsida.  Spermatopsida  contain  all  extant 
seed  plants  including  the  remaining  gymnosperms  in  the 
subclasses  Cycadidae  (cycads) ,  Pinidae  (conifers) ,  and 
Gnetidae  (gnetopsids)  with  all  flowering  plants 
(angiosperms)*  in  the  subclass  Magnoliidae.  Each  of  these 
subclasses  forms  cohesive  groups  that  are  well  defined 
morphologically  and  anatomically.  The  Gnetidae  are  widely 
regarded  as  the  most  probable  extant  sister  group  to  the 
angiosperms.  However,  relationships  among  the  cycads, 
Ginkgo,  and  the  conifers  remain  poorly  resolved  in  plant 
systematic  studies. 


Comparative  morphology  has  historically  provided  the 
basis  for  systematic  study  and  largely  defines  the 
relationships  of  plant  taxa  as  biologists  currently 
perceive  them.  In  the  past  decade  investigators  have 
significantly  increased  the  morphological  and  anatomical 
data  sets.  New  information,  re-examination  of  many 
characters  and  a  more  complete  treatment  of  fossil  taxa 
have  refined  plant  evolutionary  theory  (Mishler  and 
Churchill  1985,  Dahlgren  and  Bremer  1985,  Crane  1985a, 
1985b,  Doyle  and  Donoghue  1986,  1987) .  These  studies  also 
highlight  those  relationships  that  are  weakly  supported  by 
synapomorphic  (shared  derived)  characters.  The 
relationships  of  cycads.  Ginkgo  and  conifers  to  each  other 
and  other  seed  plants  remain  unclear  and  are  weakly 
resolved  in  systematic  studies. 

Molecular  sequences  of  homologous  macromolecules,  both 
at  the  nucleotide  and  amino  acid  level,  can  be  used  to 
infer  phylogenetic  relationships  between  extant  organisms 
(Zuckerkandl  and  Pauling  1965)  .  These  molecular  data 
provide  comparable  characters  that  can  be  used  across  a 
broad  range  of  organisms,  including  the  Chlorobionta. 

Within  the  eukaryotic  genome  different  types  of  DNA 
evolve  at  very  different  rates.  Non-coding  regions  are 
typically  more  variable  than  those  regions  coding  for 
polypeptides  or  functional  RNA's  and  different  rates  of 
nucleotide  substitution  exist  within  these  types  of  DNA  as 
well.    Different  regions  of  the  DNA  are  useful  for 


systematic  studies  at  various  taxonomic  levels.  Amino  acid 
sequences  of  functional  polypeptides  were  among  the  first 
utilized  in  molecular  evolutionary  studies  and  have  been 
used  to  examine  relationships  across  a  wide  range  of  taxa 
including  the  green  plants  (Fitch  and  Margoliash  1967, 
Fitch  1976,  Lumsden  and  Hall  1975,  Martin  and  Dowd  1986) . 

DNA  sequences  that  code  for  functional  RNAs  (tRNA  and 
rRNA)  have  also  been  used  in  phylogenetic  studies.  These 
sequences  are  widely  distributed  in  eukaryotes  and  are 
highly  conserved  even  at  higher  taxonomic  levels.  The  5S 
ribosomal  RNA  (rRNA)  is  highly  repeated  in  most  eukaryotic 
nuclear  genomes  including  the  green  plants.  The  5S  rRNA 
sequences  have  been  determined  for  many  species  including 
28  green  plants  (Hori  and  Osawa  1979,  Hori  et  al.  1985). 
Chloroplast  4.5S  rRNA  sequences  have  also  been  examined  for 
purposes  of  plant  phylogenetic  reconstruction  (Bobrova  et 
al.  1987). 

Nuclear  sequences  for  the  small  and  large  subunit 
ribosomal  RNAs  are  present  in  the  genomes  of  all  eukaryotes 
and  have  also  been  used  for  phylogenetic  studies.  These 
rRNA  sequences  are  highly  conserved  both  in  primary 
sequence  and  at  proposed  secondary  structural  levels.  The 
small  subunit  (ca.  1800  base  pairs  [bp])  and  large  subunit 
(ca.  3000  bp)  rRNA  sequences  provide  a  tenfold  larger 
sample  size  than  most  other  macromolecules  thus  far 
employed  in  plant  molecular  evolutionary  studies. 


The  coding  regions  for  these  rRNAs  are  much  more 
highly  conserved  than  the  surrounding  spacer  regions. 
Within  the  coding  region  different  areas,  or  subdomains, 
exhibit  varying  levels  of  sequence  conservation  and  which 
in  turn  can  frequently  be  correlated  with  evolutionarily 
conserved  functional  domains  of  proposed  secondary 
structure  models  (Brimacombe  and  Stiege  1985,  Gutell  et  al. 
1985)  .  This  range  of  variation  in  sequence  conservation 
makes  these  molecules  particularly  useful  in  studies 
comparing  relatively  closely  related  taxa  as  well  as  more 
distantly  related  taxa. 

Small  subunit  rRNA  sequences  are  now  available  for 
many  eukaryotes  including  several  diverse  protist  groups. 
The  wide  diversity  at  the  phenotypic  level  in  the  various 
protist  groups  cause  frequent  problems  when  trying  to 
gather  comparable  character  information  for  systematic 
study.  Studies  using  small  subunit  rRNA  sequences  have 
been  extremely  useful  in  examining  phylogenetic 
relationships  among  the  various  protist  groups  and  their 
affinities  to  other  eukarotic  (Sogin  et  al.  1986,  Gunderson 
et  al.  1987)  and  prokaryotic  taxa  (Lake  1989,  Cedergren  et 
al.  1988)  . 

Among  the  green  plants  the  evolutionary  relationships 
of  the  gymnosperm  groups  remain  unclear.  Phylogenies 
inferred  from  phenotypic  characters  differ  as  to  which,  if 
any,  of  these  gymnosperm  taxa  together  form  monophyletic 
groups  (clades)  and  which  taxa  are  separately  derived 


within  the  seed  plant  lineage.  Only  a  few  molecular 
studies  have  addressed  relationships  of  these  gymnosperm 
taxa  to  each  other  and  the  angiosperms. 

Complete  small  subunit  rRNA  sequences  have  been 
published  for  only  a  few  green  plant  taxa.  These  include 
three  angiosperms,  a  dicot  and  two  monocots  (both  grasses) , 
a  cycad  and  representatives  from  the  Chlorophycophyta 
(green  algae) .  This  study  examines  seed  plant  phylogeny 
based  on  ribosomal  RNA  sequences.  Several  clones  were 
isolated  from  a  Ginkgo  genomic  library  which  contain  full 
length  small  subunit  ribosomal  RNA  (ss  rRNA)  coding  regions 
as  indicated  by  separate  5'  and  3'  ss  rRNA  probes.  Upon 
restriction  analysis  the  majority  of  these  clones  yield  a 
map  consistent  with  conserved  restriction  sites  in  plant  ss 
rRNA  coding  regions.  Restriction  maps  from  a  second  group 
of  clones,  however,  were  inconsistent  with  those  expected 
for  a  plant  ss  rRNA  sequence  in  the  size  of  the  inferred  ss 
rRNA  coding  region  and  in  restriction  pattern. 

The  nucleotide  sequence  was  determined  for  a 
representative  clone  from  each  of  these  two  groups,  Gbr- 
1000  and  Gbr-6700,  respectively.  These  sequences  were 
compared  with  each  other  and  sequences  from  other  plant  ss 
rRNA  genes.  The  comparisons  indicate  that  the  ss  rRNA 
coding  region  contained  in  Gbr-1000  is  1811  nucleotides  in 
length  and  shares  high  sequence  similarity  with  other 
published  plant  ss  rRNA  sequences.    The  second  sequence. 


Gbr-6700,  contains  a  ss  rRNA-like  sequence  which  is  more 
divergent  in  sequence  composition  and  is  interrupted  by  an 
1100  nucleotide  insert  approximately  700  nucleotides  into 
the  ss  rRNA  like  coding  region.  The  Gbr-1000  sequence  from 
Ginkgo  was  aligned  with  other  published  plant  ss  rRNA 
sequences  and  these  data  used  to  examine  phylogenetic 
relationships  of  these  taxa  using  a  cladistic  parsimony 
approach.  The  Gbr-6700  sequence  representing  a  ss  rRNA 
pseudogene  has  been  compared  with  the  sequence  from  Gbr- 
1000  and  its  structure  characterized. 


CHAPTER  2 
REVIEW  OF  LITERATURE 


Molecular  Secfuences  in  Plant  Systematics 

Sequences  from  biological  macromolecules  have  become 
available  in  recent  years  and  present  a  new  type  of 
character  for  use  in  comparative  biology.  Molecular 
evolutionary  studies  seek  to  reconstruct  geneologies  for  DNA 
sequences  either  directly  from  nucleotide  sequences  or  from 
amino  acid  sequences  of  proteins.  Phylogenetic 
relationships  can  be  inferred  from  systematic  studies  using 
these  sequences. 

Within  a  few  years  of  the  discovery  of  the  double 
helix  structure  of  DNA  (Watson  and  Crick  1953)  and  the 
semiconservative  nature  of  its  replication  (Meselson  and 
Stahl  1958),  the  use  of  molecular  sequences  for 
evolutionary  studies  was  proposed  (Zuckerkandl  and  Pauling 
1965) .  Initially,  accumulation  of  homologous  sequences 
from  various  taxa  was  difficult.  However,  by  the  mid 
1960 's  amino  acid  sequences  for  cytochrome  c  had  been 
determined  for  a  number  of  taxa.  Early  studies  utilized 
these  and  amino  acid  sequences  from  other  proteins. 
Availablity  of  nucleotide  sequences  followed  shortly 
thereafter  with  the  development  of  biochemical  methods  for 
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their  determination  (Maxam  and  Gilbert  1977,  Sanger  et  al. 
1977)  . 

The  amino  acid  sequences  from  cytochrome  c  were  among 
the  first  molecules  used  to  examine  evolutionary 
relationships  of  higher  plants  (Boulter  et  al.  1972).  An 
analysis  of  cytochrome  c  sequences  using  maximum  parsimony 
(Baba  et  al.  1981)  included  2  6  plant  taxa.  This  work 
showed  that  phylogenies  inferred  from  the  lowest  nucleotide 
replacement  score  were  inconsistent  with  phylogenies  based 
on  phenotypic  data  for  plant  taxa.  However,  when  other 
parameters  were  considered  (ie.  gene  duplication  and  gene 
expression)  (Goodman  1979)  ,  a  phylogeny  more  consistent 
with  phenotypic  data  was  inferred. 

Partial  amino  acid  sequences  of  plastocyanin  have  been 
used  to  examine  familial  relationships  among  angiosperms 
(Boulter  et  al.  1979).  These  sequences  evolve  at  a  faster 
rate  than  those  of  cytochrome  c  and  are  therefore 
applicable  at  the  familial  level  rather  than  higher 
taxonomic  levels.  Sequences  for  40  members  of  ten  families 
were  examined  and  results  of  the  study  showed  that  members 
of  a  family  group  together  with  one  exception. 
Representatives  from  two  genera  of  the  Fabaceae  group 
together  but  are  separated  from  the  remaining  seven  genera 
of  the  family  suggesting  limits  to  resolution  obtained  from 
these  data.  Further  work  has  been  done  using  plasotcyanin 
amino  acid  sequences  to  examine  relationships  among  genera 
of  the  Ranunculaceae  and  its  relationship  to  other  plant 


families  (Grund  et  al.  1981).  These  data  grouped  the  five 
genera  of  the  Ranunculaceae  but  inferences  of  relationships 
between  families  were  not  clear,  possibly  due  to  the  number 
of  taxa  that  was  not  represented  in  the  study. 

Sequences  for  cytochrome  c  and  plastocyanin  continued 
to  accumulate  and  amino  acid  sequences  from  other  proteins 
were  determined  as  well.  The  amino  acid  sequences  for  the 
small  subunit  of  ribulose  biphosphate  carboxlase  (RBC  - 
SSU)  was  determined  for  several  plant  taxa  and  its  use  in 
phylogenetic  reconstruction  examined  (Martin  et  al.  1983)  . 
In  addition  to  these,  sequences  for  5S  rRNA  were  beginning 
to  accumulate  (Dyer  198  2) . 

Martin  et  al.  (1985)  examined  angiosperm  phylogeny 
using  amino  acid  sequences  from  four  proteins  (cytochrome 
c,  RBC  -  SSU,  plastocyanin,  and  ferredoxin)  and  the  5S 
rRNA.  Amino  acid  sequences  were  converted  to  inferred 
nucleotide  sequences,  and  a  parsimony  method  was  employed. 
In  addition  to  using  unweighted  data,  a  weighting  scheme 
was  applied  to  the  data.  The  ratios  of  observed/ expected 
incompatibilities  were  determined  for  nucleotide  positions 
and  used  to  weight  characters  so  that  positions  not  showing 
parallelism  were  favored.  Majority  rule  consensus  trees 
presented  indicate  that  genera  within  a  family  are  grouped 
together  and  inferred  relationships  between  families  were 
largely  consistent  with  phylogenies  based  on  phenotypic 
data.   Although  the  grouping  of  Fabaceae  and  Brassicaceae 
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inferred  from  the  data  is  not  supported  by  phenotypic 
evidence,  the  authors  discussed  the  need  for  more  complete 
data  sets  and  noted  that  detailed  comparisons  of  published 
phylogenies  could  not  be  made. 

Further  investigations  of  plant  relationships  using 
partial  amino  acid  sequences  from  RBC  -  SSU  were  conducted. 
Sequences  for  15  monocot  genera  and  four  gymnosperm  taxa 
were  determined  (Martin  and  Dowd  1986) .  This  analysis 
showed  once  again  that  genera  within  a  family  grouped 
together.  In  addition  to  this,  the  monocot  taxa  grouped 
together  relative  to  dicot  taxa  and  gymnoposperm  taxa. 
Limits  on  resolution  were  apparent  at  higher  levels,  and 
the  precise  joining  of  some  internodes  was  not  clear. 
Although  members  of  the  Liliaceae  grouped  together,  precise 
relationships  within  the  family  were  not  resolved.  The 
node  joining  gymnospeirms ,  dicots  and  monocots  was  also 
ambiguous  and  therefore,  left  unresolved  in  the  presented 
phylogeny.  The  inferred  phylogeny  did  show  that  of  the 
gymnosperms  included.  Ephedra  grouped  closer  to  the 
angiosperms  than  cycad  and  conifer  taxa.  Also,  of  the 
monocots  included,  members  of  the  Alismatidae  grouped 
closest  to  the  dicots.  The  authors  noted  that  these 
sequences  are  useful  in  deriving  approximately  accurate 
phylogenies  but  that  sequences  from  additional 
macromolecules  would  be  needed  to  resolve  relationships 
where  internodes  are  short. 
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By  1989  the  RBC  -  SSU  data  set  had  been  expanded  to 
include  representative  genera  from  124  families.  These 
sequences  were  analyzed  and  again  genera  within  a  family 
usually  grouped  together  (Martin  and  Dowd  1989) .  Family 
nodes  were  derived  when  members  of  a  family  grouped 
dichotomously  and  then  used  to  examine  familial 
relationships  as  well  as  those  above  the  family  level.  Of 
the  124  families  represented  102,  were  placed  in  24  groups 
and  22  families  remained  ungrouped  reflecting  taxonomic 
uncertainty.  This  analysis  closely  grouped  the  four 
included  gymnosperm  taxa  and  the  node  was  used  to  represent 
gymnosperms  in  subsequent  analyses  of  angiosperm  taxa.  The 
family  Schisandraceae  always  grouped  closest  to  the 
gymnosperms.  Other  major  features  of  inferred  phylogenies 
showed  that  monocots  grouped  together  and  that  these  were 
placed  closest  to  the  dicot  families  Piperaceae  and 
Nelumbonaceae.  Criteria  used  in  the  analysis  were  shown  to 
affect  the  results  significantly.  Using  a  majority  rule 
consensus,  80%  of  the  families  grouped  dichotomously 
whereas  using  a  strict  consensus  method  the  success  rate 
was  reduced  to  approximately  44%.  Based  on  evaluation 
using  independent  taxonomic  criteria  the  authors  noted  that 
the  intra-group  results  appear  to  be  approximately  correct 
and  that  strict  consensus  trees  may  be  too  strict  for  use 
with  these  data. 

The  use  of  sequences  from  organellular  genomes  can 
also  be  utilized  in  comparative  studies.   Sequences  for  the 
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large  subunit  of  ribulose  biphosphate  carboxylase/oxygenase 
(rbc  L)  from  the  chloroplast  genome  have  been  determined 
for  five  angiosperms  and  a  green  alga.  Sequences  were  also 
available  for  three  prokaryotes.  Higher  plant 
representatives  for  ATP  synthase  (atp  B)  ,  non-coding 
sequences  for  a  tRNA  intron  (trn  VI)  and  a  5'  leader  of  rbc 
L  (Ritland  and  Clegg  1987)  were  also  utilized.  The 
analysis  of  these  data  was  carried  out  using  a  maximum 
likelihood  approach  to  examine  inferred  tree  topologies. 
All  15  possible  topolgies  for  five  plant  taxa  were  examined 
using  the  rbc  L  sequences.  Comparisons  were  also  made 
between  phylogenies  inferred  from  each  of  the  three 
different  codon  positions.  A  node  grouping  the  two 
monocots  was  strongly  supported  but  resolution  of  the  dicot 
taxa  was  less  certain.  A  bootstrap  resampling  method  was 
employed  to  examine  the  strength  of  the  inferred 
topologies.  This  indicated  that  a  node  joining  tobacco  and 
pea  with  spinach  placed  intermediate  relative  to  the 
monocots  was  weakly  supported  over  alternatives  topologies. 
Consistency  between  topologies  inferred  from  the  four 
different  types  of  sequences  was  examined.  This  indicated 
that  each  of  the  three  codon  positions  from  rbc  L  and  atp  B 
as  well  as  the  two  noncoding  sequences  support  the  same 
topology.  For  this  topology  the  monocots  are  grouped 
dichotomously  at  a  node  and  the  dicots  are  grouped  together 
in  an  unresolved  trichotomy.   Dendrograms  presented  based 
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on  a  UPGM  (unweighted  pair  groups  method)  analysis  agree 
with  other  results  in  grouping  the  monocots  together  and 
the  dicots  together.  However,  this  approach  supports  a 
grouping  of  spinach  and  tobacco  with  pea  as  intermediate  in 
contrast  to  the  spinach  intermediate  inferred  from  maximum 
likelihood  analysis. 

One  possible  reason  for  the  discrepancy  is  that  UPGM 
assumes  equal  evolutionary  rates  while  maximum  likelihood 
does  not.  The  applicability  of  chloroplast  sequences  to 
phylogenetic  study  is  supported  by  the  consistency  of  the 
results  using  different  sequences.  This  is  apparent 
despite  unequal  rates  between  codon  positions,  different 
types  of  sequences  and  different  lineages. 

Sequences  for  5S  rRNA  have  also  been  utilized  in 
phylogenetic  studies  (Kimura  and  Ohta  1973,  Hori  1975, 
Schwartz  and  Dayhoff  1976) .  These  molecules  are  116  -  120 
nucleotides  in  length  and  are  present  in  a  wide  range  of 
organisms.  Hori  and  Osawa  (1979)  presented  a  study  based 
on  5S  rRNA  sequences  from  54  species  which  included  five 
angiosperms  and  a  green  alga.  The  rate  of  nucleotide 
substitution  Kj^uc'  ^representing  the  number  of  replacements 
per  site  per  year  (Hori  1975) ,  was  calculated  for  all  pairs 
of  5S  rRNA  sequences.  Phylogenetic  trees  were  inferred 
using  a  matrix  method.  The  phylogenetic  tree  presented 
grouped  the  green  plants  together  with  the  green  algal 
sequence  basally  divergent  within  this  group.  Later,  a 
study  focussing  on  plant  phylogeny  was  presented  which 
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included  sequences  from  28  plant  sequences  (Hori  et  al. 
1985) .  The  phylogeny  presented  generally  reflects  plant 
evolutionary  theory  in  the  relative  order  of  divergence  of 
higher  taxonomic  groups.  However,  precise  relationships 
among  some  higher  taxonomic  groups  (eg.  bryophytes,  ferns 
and  gymnosperms)  are  not  consistent  with  phenotypic 
studies.  The  green  algae  are  basal  in  the  proposed 
phylogeny  with  the  charophyte  Nitella  intermediate  between 
them  and  land  plants.  The  ferns  and  fern  allies  group 
together  as  do  the  bryophytes.  However,  these  two  groups 
share  a  node  after  splitting  from  other  higher  plants  which 
is  not  supported  by  phenotypic  data.  The  gymnosperm  taxa 
represented  (not  including  gnetopsids)  also  form  a  group. 
This  has  been  proposed  based  on  phenotypic  data  but  is 
neither  strongly  supported  nor  widely  accepted  by  plant 
systematists.  The  angiosperms  group  together  relative  to 
other  plants  but  small  Kj^^,  values  did  not  provide 
resolution  within  the  group. 

Chloroplast  4 .  5S  rRNA  sequences  have  also  been 
examined  for  purposes  of  plant  phylogenetic  study.  A  study 
by  Bobrova  et  al.  (1987)  included  sequences  from  ten  plant 
taxa,  two  bryophytes,  a  fern,  four  monocots  and  three 
dicots.  This  analysis  infers  a  grouping  of  the  two 
bryophytes  as  well  as  a  grouping  of  the  angiosperm  taxa. 
Within  the  angiosperms,  the  dicots  (Nicotiana,  Ligularia, 
Spinacia)  group  together  but  a  grouping  of  all  monocot  taxa 
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is  not  well  supported.  The  two  grasses  (Zea  and  Triticum) 
group  together  but  the  positions  of  Acorus  and  Spirodella 
are  not  well  resolved.  Another  result  inconsistent  with 
phenotypic  data  is  the  position  of  the  bryophytes 
intermediate  between  that  of  the  fern  and  the  angiosperms. 
Poor  resolution  can  at  least  in  part  be  attributed  to  the 
slow  rate  at  which  these  sequences  evolve.  For  example, 
Nicotiana  and  Liqularia  have  identical  4 .  5S  rRNA  sequences 
but  belong  to  separate  orders  within  the  Asteridae 
(Cronquist  1981) . 

Some  of  the  most  extensive  studies  of  plant 
chloroplast  DNA  (cpDNA)  evolution  have  been  undertaken  by 
Palmer  and  coworkers  (Palmer  and  Thompson  1982,  Palmer 
1987,  Palmer  et  al .  1988).  Several  methods  of  analyzing 
cpDNA  have  been  examined  and  are  applicable  at  various 
taxonomic  levels.  These  include  the  study  of  the 
arrangement  and  structure  of  the  chloroplast  genome, 
restriction  enzyme  analysis,  and  direct  DNA  sequencing. 
Restriction  analysis  was  the  earliest  method  used  and  has 
been  applied  to  phylogenetic  problems  from  the  inter- 
species level  to  the  intra-f  amilial  level.  At  lower 
taxonomic  levels  data  can  be  based  on  presence  or  absence 
of  restriction  fragments.  At  higher  levels  restriction 
analysis  often  requires  complete  mapping  of  the  cpDNA. 
From  the  restriction  data,  nucleotide  substitutions  are 
inferred  and  used  to  construct  phylogenies.  Results  of 
these  types  of  analysis  have  been  particularly  useful  in 
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grouping  species  of  a  genus,  genera  within  families,  and 
has  provided  some  information  on  relationships  between 
families.  Additionally,  some  information  at  the  intra- 
specific  level  is  derived  from  this  type  of  analysis,  but 
the  slow  rate  of  cpDNA  evolution  somewhat  limits  the 
usefulness  of  restriction  data  at  this  level.  To  date 
investigations  of  higher  order  relationships  of  plants  have 
been  limited.  However,  initial  studies  using  cpDNA 
sequences  (particularly  from  rbc  L  genes)  from  limited  taxa 
have  indicated  that  this  approach  should  be  useful  at 
higher  taxonomic  levels.  Additionally,  the  sequence 
approach  is  useful  within  more  diverse  families  where 
restriction  data  alone  does  not  provide  sufficient 
resolution. 

The  applicability  of  molecular  data  to  plant 
phylogenetic  studies  is  viewed  as  promising  by  many 
investigators  although  problems  and  limits  of  some 
molecular  data  are  recognized.  Other  investigators  are 
more  critical  and  have  chosen  not  to  include  molecular  data 
in  analyses  on  which  plant  classifications  are  based. 
Bremer  et  al.  (1987)  examined  5S  rRNA  sequences  and 
conducted  a  cladistic  parsimony  re-analysis  of  the  data. 
In  this  analysis  57  most  parsimonious  trees  were  found. 
Only  one  example  of  these  was  presented  but  it  was  highly 
incongruent  with  phenotypic  data.  The  authors  then 
excluded  the  5S  rRNA  data  from  the  analysis  citing 
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extensive  homoplasy  at  levels  higher  than  those  found  in 
morphological  data  sets. 

Steele  et  al.  (1988)  commented  on  the  quick  dismissal 
of  the  5S  rRNA  data  in  plant  systematic  studies.  The 
authors  discussed  several  problems  in  the  use  of  these 
sequences  and  offered  suggestions  on  how  some  of  these 
might  be  dealt  with.  Methodological  problems  exist  in 
analyzing  molecular  data.  Many  studies,  especially  early 
ones,  utilized  phenetic  approaches  in  molecular 
systematica.  These  are  viewed  less  favorably  than 
cladistic  parsimony  methods  by  many  systematists.  However, 
more  recently  cladistic  studies  of  molecular  data  have 
become  more  common.  Other  problems  such  as  homoplasy, 
constraints  imposed  by  secondary  structure  of  the  molecule 
and  non-random  substitution  have  also  been  discussed. 

One  major  concern  when  using  structural  RNA  sequences 
is  the  affect  of  stem  pairing  regions  on  substitutional 
rates.  Wheeler  and  Honeycutt  (1988)  examined  rates  of 
change  in  non-paired  (loop)  and  stem  regions.  They 
observed  a  higher  rate  of  sequence  differences  that 
maintain  base  pairing  than  expected  based  on  a  neutral 
model.  They  suggest  that  there  is  positive  selection  for 
the  second  substitution  in  stem  regions  which  restores  base 
pairing.  These  investigators  also  examined  phylogenetic 
recontructions  using  stem  regions  only,  loop  regions  only, 
and  the  combined  data.  Their  results  indicated  that 
phylogenies  inferred  from  stem  positions  and  the  combined 
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data  were  not  congruent  with  those  inferred  from  non- 
pairing  positions  and  that  the  latter  were  more  consistent 
with  phenotypic  data. 

Plant  molecular  data  sets  for  cytochrome  c,  RBC  -  SSU, 
plastocyanin,  and  ferredoxin  (Martin  and  Dowd  1986,  Boulter 
et  al.  1979)  have  recently  been  re-examined  using  a 
cladistic  parsimony  approach  (Bremer  1988) .  Amino  acid 
sequences  were  converted  to  inferred  nucleotide  sequences. 
The  combined  data  for  all  four  proteins  contain  potentially 
informative  positions  for  nine  angiosperm  families  (usually 
represented  by  more  than  one  species) .  The  data  matrix  was 
analyzed  using  PAUP  (Phylogenetic  Analysis  Using  Parsimony) 
developed  by  Swofford  (1985) .  Two  most  parsimonious  trees 
are  inferred  from  this  data  matrix  which  require  161  steps 
(nucleotide  replacements)  with  a  consistency  index  of 
0.689.  This  indicated  that  50  steps  were  a  result  of 
parallelisms  and  reversals  and  therefore  homoplasy  is 
common  in  the  data.  Additionally  one  tree  at  162  steps  and 
three  at  163  steps  were  found.  A  strict-consensus  tree  for 
these  shortest  trees  found  showed  that  there  were  no 
monophyletic  groups  common  to  all  five  trees  (161  -  163 
steps) .  Although  resolution  is  limited,  these  sequences 
provided  information  on  some  relationships.  They  have  been 
useful  at  intra-familial  levels  and  often  have  shown  that 
among  families  two  families  are  more  closely  related  to 
each  other  than  either  is  to  other  families.   Once  again  it 
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was  noted  that  additional  sequences  were  needed  before  the 
usefulness  of  this  type  of  sequence  data  in  plant 
systematics  could  be  assessed. 

Mishler  et  al.  (1988)  also  reviewed  some  of  the 
problems  in  using  molecular  data  for  phylogentic  studies. 
In  sequences  of  structural  RNAs,  compensating  substitutions 
in  stem  forming  regions  may  not  represent  independent 
characters  in  an  analysis.  One  solution  to  this  problem 
may  be  to  eliminate  one  of  two  pairing  positions  from  the 
data  matrix  (Steele  et  al.  1988).  A  weighting  system 
which  gives  less  weight  to  one  of  two  positions  in  a  paired 
region  may  also  be  useful  in  dealing  with  the  problem. 
However,  caution  should  be  exercised  as  many  secondary 
structure  models  have  not  been  tested  biochemically,  but 
instead  are  inferred  from  structural  models  of  homologous 
sequences  from  other  taxa. 

Mishler  et  al.  (1988)  also  addressed  the 
transition/transversion  bias  reported  in  some  molecular 
data  sets.  Again  a  weighting  system  can  be  applied  to  the 
data  if  ratios  of  these  types  of  changes  are  known.  This 
can  itself  present  a  problem  when  both  types  of  change  have 
occurred  in  the  same  character  position  of  a  data  matrix. 
There  has  not  yet  been  a  computer  parsimony  program 
available  that  can  apply  weights  within  a  character,  but  a 
forthcoming  version  of  PAUP  (Swofford  1985)  will  reportedly 
contain  such  an  option. 
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Another  problem  discussed  by  Mishler  et  al.  (1988)  is 
the  different  rates  at  which  various  sequences  evolve.  If 
substitution  rates  are  too  rapid  for  homologous  sequences 
at  a  given  taxonomic  level,  information  will  be  lost  due  to 
multiple  substitutions  at  individual  sites.  The  problem 
could  be  addressed  by  determining  which  molecules  evolve  at 
rates  useful  for  the  taxonomic  level  being  studied.  It  is 
therefore  not  a  question  of  whether  molecular  sequences  are 
appropriate  for  phylogenetic  study,  but  rather  which 
sequences  are  useful  at  which  taxonomic  levels. 

Homoplasy  also  presents  a  problem  in  molecular  data. 
In  morphological  data  considerable  character  analysis  is 
performed  prior  to  an  analysis.  This  has  not  been  the  case 
for  most  molecular  data.  An  advantage  of  molecular  data  is 
the  potential  to  utilize  very  large  data  sets.  Therefore, 
it  may  be  possible  to  acquire  a  large  number  of 
historically  informative  characters  despite  the  homoplasy 
present  in  the  data.  Homoplasy  is  not  restricted  to  plant 
sequences.  Miyamoto  et  al.  (1987)  encountered  the  problem 
in  primate  sequences  but  have  still  been  able  to  make  sound 
phylogenetic  inferences  by  recognizing  the  problems  and 
incorporating  methods  of  dealing  with  them  into  their 
analysis.  Although  homoplasy  is  a  problem  in  all  data  it 
may  be  more  prevalent  in  plant  sequences  than  in  animal 
sequences.  Plant  and  animal  phylogenies  have  been 
constructed  based  on  cytochrome  c  sequences.  The  amount  of 
homoplasy  due  to  convergence,  parallelism  and  reversals  was 
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also  examined  (Syvanen  et  al.  1989).  These  investigators 
found  that  homoplasy  was  more  common  in  the  plant  data  than 
in  the  animal  data  for  the  same  protein  sequence.  The  data 
sets  for  26  plant  species  and  27  animal  species  contain  84 
and  85  characters,  respectively.  For  unrooted  phylogenies, 
the  consistency  index  for  plant  sequences  was  0.50  compared 
with  0.68  for  animal  sequences. 

One  of  the  greatest  advantages  of  molecular  sequences 
is  the  potentially  large  number  of  characters  they  can 
provide.  However,  most  plant  studies  to  date  have  used 
relatively  small  sequences  (<150  nucleotides)  because  of 
the  availability  of  homologous  sequences  from  a  number  of 
taxa.  This  is  changing  as  more  rapid  sequencing  methods 
are  becoming  available. 

Small  and  Large  Subunit  rRNAs 

The  coding  regions  for  the  small  subunit  ribosomal  RNA 
(ss  rRNA)  and  the  large  subunit  ribosomal  RNA  (Is  rRNA)  are 
approximately  1800  and  3000  nucleotides,  respectively. 
These  are  present  in  all  prokaryotes  and  eukaryotes.  In 
most  eukaryotes  and  in  all  plant  taxa  examined  these  are 
arranged  in  tandemly  repeated  arrays  and  contain  the  5.8S 
rRNA  between  the  ss  rRNA  and  Is  rRNA.  The  coding  regions 
for  the  rRNAs  are  separated  by  spacer  DNA  (internal 
transcribed  spacer  or  ITS)  which  is  transcribed  as  part  of 
the  transcription  unit.    Intergenic  spacer  (IGS)  DNA  is 
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also  present  between  the  tandemly  arrayed  units  and  was 
orginally  termed  nontranscribed  spacer  or  NTS  (Jorgensen 
and  Cluster  1988) .  However,  portions  of  this  at  the  5'  end 
of  the  ss  rRNA  and  the  3'  end  of  the  Is  rRNA  are 
transcribed  as  part  of  the  transcription  unit  (Long  and 
Dawid  1980,  Dvorak  and  Appels  1982).  The  large 
transcription  unit  is  subsequently  processed  in  the  nucleus 
to  yield  mature  rRNA  molecules. 

Plant  genomes  contain  a  large  number  of  ribosomal 
repeat  units  ranging  from  200  to  22,000  copies  per  haploid 
genome  for  species  examined  (Rogers  and  Bendich  1987) . 
Within  the  IGS  region  of  plant  species  examined  there  are 
subrepeats  of  100  to  300  nucleotides.  The  number  of  these 
subrepeats  have  been  shown  to  vary  between  individuals  of  a 
species,  within  individuals  (Appels  and  Dvorak  1982, 
Jorgensen  et  al.  1987),  and  between  neighboring  gene 
repeats  on  a  chromosome  (Rogers  and  Bendich  1987) . 
Ribosomal  repeats  may  be  present  on  more  than  one 
chromosome  and  in  some  cases  where  this  occurs,  individual 
size  classes  related  to  subrepeat  numbers  have  been 
correlatd  to  different  chromosomes  (Appels  and  Honeycutt 
1986) .  Inbred  lines  of  maize  have  been  examined  in  which 
there  is  no  detectable  size  variation  in  rRNA  repeats  but 
some  non-inbred  lines  do  show  variability  among  individuals 
in  the  number  of  subrepetitive  elements  (Rogers  and  Bendich 
1987)  . 
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The  function  of  these  subrepetitive  elements  has  not 
been  experimentally  determined  for  plants.  However, 
because  of  similarities  with  other  systems  inferences  have 
been  made  from  animal  and  protist  studies.  Evidence  has 
suggested  that  these  subrepetitive  regions  are  hot  spots 
for  recombination  and  provide  a  mechanism  for  change  in 
copy  number  of  rRNA  repeats.  These  subrepeats  have  been 
called  enhancers  and  may  bind  factors  involved  in  RNA 
polymerase  I  attachment  thus  affecting  transciptional 
rates.  Another  possible  role  of  these  elements  is  that 
they  function  as  RNA  processing  sites  and/or  transcription 
termination  sites  (Rogers  and  Bendich  1987)  .  These  sites 
have  been  shown  to  retain  polymerase  instead  of  allowing 
dissociation  at  the  3'  end  of  the  Is  rRNA  gene.  In  some 
systems  these  elements  have  been  shown  to  be  necessary  for 
transcription  of  the  adjacent  gene  of  a  repeated  array 
(McStay  and  Reeder  1986) .  It  has  also  been  suggested 
(Rogers  and  Bendich  1987)  that  if  these  elements  do  in  fact 
contain  terminators  they  may  play  a  role  in  preventing 
unnecessary  transcription  in  genomes  containing  large 
numbers  of  rRNA  repeats. 

Although  some  types  of  heterogeneity  exist  among  rRNA 
repeat  units,  these  units  have  been  shown  to  be  largely 
homogeneous  within  an  individual  genome.  The  homogeneity 
is  presumably  due  to  concerted  evolution  of  the  repeat  rRNA 
units  (Arnheim  et  al.  1980) .  Variation  in  an  individual's 
rRNA  sequences  has  been  observed  in  length  of  repeat  units. 
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nucleotide  sequence,  copy  number  and  base  modification  (ie. 
methylation  of  cytosine  residues) .  These  different  types 
of  variation  can  be  useful  in  examining  phylogenetic 
relationships  at  various  taxonomic  levels  (Jorgensen  and 
Cluster  1988)  . 

The  most  variable  portion  of  the  rRNA  repeat  has  been 
found  within  the  region  of  the  IGS  that  contains  the 
subrepetitve  elements.  The  next  most  common  variation 
reported  is  in  other  regions  of  the  IGS.  Spacer  regions 
within  the  repeat  (ITS)  between  rRNA  coding  regions  are 
more  highly  conserved  than  the  IGS  but  are  much  more 
variable  than  the  ss  rRNA,  Is  rRNA  and  the  5.8S  rRNA  coding 
regions.  These  spacer  regions  have  been  useful  in 
examining  plant  relationships  from  the  intra-populational 
level  to  the  inter-generic  level  (Schaal  et  al.  1987, 
Schaal  and  Learn  1988,  Hamby  and  Zimmer  1988). 

The  most  highly  conserved  portions  of  the  ribosomal 
repeat  are  the  regions  coding  for  mature  rRNA.  Different 
rates  of  change  are  also  observed  for  areas  within  these 
regions.  Generally  the  ss  rRNA  is  more  conserved  overall 
than  is  the  Is  rRNA  but  the  more  variable  regions  within 
the  ss  rRNA  are  not  as  highly  conserved  as  some  regions  of 
the  Is  rRNA.  Overall  the  3'  regions  of  each  of  these  are 
more  conserved  than  their  respective  5'  regions  and  this 
difference  is  more  pronounced  in  the  Is  rRNA  sequence. 
Length  heterogeneity  has  not  been  detected  in  these  coding 
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regions  from  restriction  analysis  of  plant  taxa  (Jorgenesn 
and  Cluster  1988) .  However,  length  differences  of  one  to 
six  nucleotides  are  indicated  from  comparisons  of  ss  rRNA 
available  for  plant  species  (Eckenrode  et  al.  1985,  Nairn 
and  Ferl  1988)  .  The  different  rates  of  substitution 
observed  within  these  coding  regions  make  these  sequences 
potentially  useful  from  the  intergeneric  level  to  higher 
taxonomic  levels  including  comparisons  between  kingdoms  of 
major  groups  of  organisms  (Jorgensen  and  Cluster  1988, 
Hasegawa  et  al.  1985,  Gouy  and  Li  1989,  Lake  1989)  .  These 
and  other  studies  (Wolters  and  Erdmann  1986,  Gunderson  et 
al.  1987)  using  the  limited  numbers  of  sequences  available 
have  been  consistent  with  a  monophyletic  grouping  of  plants 
including  the  green  algae  and  excluding  other  algae  from 
this  group. 

Few  complete  sequences  of  ss  rRNA  genes  are  available 
for  plant  species  but  the  rate  of  accumulation  of  these  is 
increasing  due  to  more  rapid  sequencing  methods.  The  most 
extensive  rRNA  studies  for  plants  to  date  have  utilized 
partial  sequences  from  both  ss  rRNA  and  Is  rRNA.  Direct 
sequencing  methods  for  RNA  have  allowed  data  accumulation 
for  a  large  number  of  plant  taxa. 

A  study  of  nine  species  from  the  Poaceae  (grass 
family)  has  been  conducted  using  the  fern  ally  Psilotum  as 
an  outgroup  (Hamby  and  Zimmer  1988) .  A  parsimony  approach 
using  PAUP  was  used.   The  data  contained  1648  positions. 
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Of  these  244  positions  were  variable  for  all  taxa,  119  were 
variable  within  the  seed  plant  taxa,  and  85  were 
potentially  phylogenetically  informative. 

A  single  most  parsimonious  tree  was  inferred  from  the 
analysis  which  required  161  steps.  Additionally  59  next 
most  parsimonious  trees  were  reported  ranging  from  162  - 
167  steps.  The  results  of  the  analysis  were  compared  with 
a  classification  proposed  for  the  Poaceae  (Watson  et  al. 
1985) .  The  most  parsimonious  tree  grouped  the  members  of 
the  subfamily  Panicoideae  (Zea,  Tripsacum.  Sorghum,  and 
Saccharum)  monophyletically .  Members  of  the  subfamily 
Pooideae  also  formed  a  monophyletic  group  (Triticum. 
Hordeum,  and  Avena) .  The  two  members  of  the  Bambusoideae, 
Oryza  and  Arundinaria .  however,  did  not  group 
monophyletically  in  the  rRNA  analysis.  The  classification 
of  Watson  et  al.  (1985)  placed  Hordeum  and  Triticum  in  the 
supertribe  Triticanae  while  Avena  is  placed  in  the 
supertribe  Poanae.  The  rRNA  analysis  did  not  agree  with 
the  classification  at  this  level.  For  the  most 
parsimonious  tree  (161  steps)  and  the  next  most 
parsimonious  trees  (162  -  167  steps)  found,  the  rRNA  data 
indicated  that  Hordeum  and  Avena  are  more  closely  related 
than  either  is  to  Triticum.  However,  the  authors  noted 
that  the  data  contained  only  2  2  informative  characters  for 
members  within  the  Pooideae  and  that  accumulation  of  more 
sequence  data  may  improve  the  resolution  of  these  groups. 
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A  study  of  higher  level  plant  relationships  has  also 
been  conducted  using  partial  ss  rRNA  and  Is  rRNA  sequences 
(Zimmer  et  al.  1989).  About  1700  nucleotides  for  39  plant 
species  were  determined.  Of  these  514  were  variable,  350 
of  which  were  potentially  phylogenetically  informative. 
Two  equally  most  parsimonious  trees  were  inferred  from  PAUP 
analysis  of  the  data  which  required  1296  steps.  These 
diff erred  only  in  the  placement  of  two  grass  species.  A 
tree  presented  therein  showing  major  groups  of  most 
parsimonious  trees  reflects  higher  level  relationships  of 
vascular  plants  included  in  the  study.  The  analysis 
indicated  that  the  seed  plants  are  monophyletic.  The 
gymnosperms  were  resolved  as  paraphyletic  but  cycads. 
Ginkgo  and  conifers  form  a  monophyletic  group.  The 
angiosperms  were  also  grouped  monophyletically.  Nvmphaea 
and  Cabomba  group  together  and  are  reflected  as  the  sister 
group  to  the  other  dicots  included.  The  Arales  and  the 
grasses  also  grouped  together  in  this  analysis  but  two 
other  monocots  not  represented  in  the  tree  were  reported  to 
have  grouped  within  the  dicots.  Another  feature  of  this 
phylogeny  is  that  the  cycad-Ginkqo- conifer  group  was 
resolved  as  the  sister  group  to  the  angiosperms  in  contrast 
to  the  widely  supported  view  that  the  Gnetales  are  the 
extant  sister  group  to  the  angiosperms. 

Like  many  other  molecular  data  sets,  a 
transition/transversion  bias  has  been  reported  for  this 
plant  rRNA  data.   The  data  were  also  analyzed  using  the 
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evolutionary  parsimony  method  developed  by  Lake  (1987) 
which  utilizes  primarily  transversional  differences  in  the 
sequences.  The  resolution  of  taxa  involved  was  reduced 
using  this  method  but  the  monophyletic  grouping  of 
angiosperms  was  retained  and  the  Gnetales  were  resolved  as 
the  sister  group  to  the  angiosperms  as  supported  by 
phenotypic  data  sets.  Further  examination  of  the  total 
data  was  conducted  using  PAUP  and  it  was  found  that  it 
required  only  three  additional  steps  over  the  most 
parsimonious  trees  to  place  the  Gnetales  as  the  sister 
group  to  the  angiosperms. 

It  is  widely  proposed  in  molecular  systematic  studies 
that  more  nucleotide  positions  as  well  as  sequences  from 
additional  taxa  are  needed  to  address  plant  phylogeny  with 
molecular  data.  Another  problem  facing  molecular 
systematists  is  the  availability  of  rigorous  computer 
methods  for  dealing  with  the  data.  Many  investigators  have 
remarked  on  the  need  for  more  careful  analysis  of  molecular 
data  sets  (Patterson  1987,  Bremer  1988,  Steele  et  al.  1988, 
Mishler  et  al.  1988,  Humphries  1989).  Caution  has  also 
been  urged  in  accepting  phylogenetic  systems  based  solely 
on  rRNA  sequences  (Rothschild  et  al .  1986)  but  few,  if  any, 
molecular  systematic  studies  have  advocated  such  an  extreme 
approach.  It  is  important  that  all  available  biological 
data  be  considered  when  addressing  phylogenetic 
relationships . 


CHAPTER  3 
MATERIALS  AND  METHODS 


Cloning  and  Sequencing  of  ss  rRNA  Genes  from  Ginkgo  biloba 

Genomic  DNA  for  Ginkgo  biloba  was  isolated  from  fresh 
young  leaves  as  described  by  Rivin  et  al.  (1982).  Lambda 
vector  DNA,  EMBL3  (Frischauf  et  al.  1983),  was  grown  using 
K803  host  cells  in  NZCYM  media  and  isolated  by  the  CsCl 
procedure  as  described  by  Maniatis  et  al.  (1982).  The 
plasmid  pUC  19  (Messing  and  Vieira  1982)  and  subclones 
therein  were  grown  in  host  cell  line  TGI  (Gibson  1984)  in 
YT  media  (Miller  1972).  Plasmid  DNA  was  isolated  by  the 
alkali  lysis  method  (Birnboim  and  Doly  1979)  and  purified 
over  two  successive  CsCl  gradients.  M13  sequencing 
vectors,  MP  18  and  MP  19,  and  subclones  therein  were 
cultured  using  TGI  host  cells  in  YT  media  and  isolated  by  a 
PEG  (polyethylene  glycol)  precipitation  method  (Messing 
1983)  . 

A  lambda  genomic  library  for  Ginkgo  biloba  was  made 
using  EMBL3 .  Bam  HI  compatible  lambda  arms  were  prepared 
and  genomic  DNA  was  partially  digested  with  Bam  HI . 
Genomic  digestions  were  optimized  to  generate  fragments  in 
the  15-2  0  kb  range.  Genomic  DNA  was  ligated  into  EMBL3 
arms  at  a  1:1  molar  ratio  overnight  at  14°  C.   Recombinant 
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phage  DNA  was  packaged  using  Packagene  extract  (Promega 
Biotech)  according  to  the  manufacturers  protocol.  After 
titering,  the  libraries  were  plated  on  150  mm  petri  plates. 
Duplicate  blots  of  each  plate  were  lifted  on 
nitrocellulose  then  denatured,  neutralized  and  dried 
(Benton  and  Davis  1977) .  Library  lifts  were  probed  using 
plant  small  subunit  ribosomal  clones  from  Zamia  pumila. 
pZpr-1300  and  pZpr-1400,  which  contain  the  5'  and  3' 
portions  of  the  gene.  Inserts  from  these  two  clones 
released  by  restriction  digests,  550  bp  and  1.3  kb 
respectively,  were  isolated  on  dialysis  membrane  and  3MM 
paper  (Maniatis  et  al.  1982)  after  electrophoresis  on  1.0% 
agarose  gels  in  TBE  buffer  (0.089  M  Tris-borate,  0.089  M 
boric  acid,  0.002  M  EDTA  [ ethylenediamine-tetraacetic 
acid]).  Insert  DNA  was  labelled  by  nick  translation  (Rigby 
et  al.  1977)  .  Prehybridization  was  carried  out  in  5X  SSC 
(0.75  M  NaCl,  0.06  M  NaH2P04,  0.005  M  EDTA,  pH  7.0),  5X 
Denhardts  solution  (Maniatis  et  al.  1982),  0.1%  SDS  (sodium 
dodecyl  sulfate)  and  100  ug/ml  herring  sperm  DNA  for  4-6 
hours  at  65°  C.  Probe  was  added  and  hybridization 
continued  overnight.  Blots  were  washed  in  3X  SSC  with  0.5% 
SDS  once  at  room  temperature  for  5  minutes  then  twice  at 
65°  C  for  30  minutes  each  (Maniatis  et  al.  1982) .  Library 
lifts  were  then  blotted  lightly  between  3MM  paper  and 
wrapped  in  PVC  plastic  (Fisher  Scientific) .  Exposure  was 
overnight  on  Kodak  XAR-5  film  with  X-Omatic  intensifying 
screens  at  -70°  C.   Positive  plaques  found  on  both  of  the 
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duplicate  lifts  were  picked  and  purified  through  three 
successive  screenings  using  the  Zamia  ribosomal  probes. 

Recombinant  lambda  clones  were  mapped  using  a  variety 
of  restriction  enzymes.  DNAs  were  separated  by 
electrophoresis  on  1.0%  agarose  gels  in  TBE  buffer.  Gels 
were  blotted  on  nitrocellulose  filters  (Southern  1975) . 
Hybridization  was  carried  out  as  above  for  library  lifts. 

A  Sal  I  restriction  fragment,  ca.  6.1  kb,  from  lambda 
Gb-01  containing  the  entire  ss  rRNA  coding  region  was 
subcloned  into  the  Sal  I  site  of  pUC  19  to  become  pGbr- 
1000.  The  Sal  I  insert  of  pGbr-1000  contains  a  single  Eco 
RI  site.  pGbr-1000  was  digested  with  Sal  I  and  Eco  RI  and 
the  two  resulting  fragments  were  subcloned  into  the  Sal  I- 
Eco  RI  sites  of  the  M13  sequencing  vectors  MP  18  and  MP  19 
(mGbr-1011, 1012, 1021, 1022)  . 

The  genomic  clone  lambda  Gb-06  contained  a  single  Bam 
HI  fragment,  ca.  12  kb,  which  hybridized  to  both  5'  and  3' 
ribosomal  probes.  Restriction  fragments  from  the  Bam  HI 
insert  hybridizing  to  these  probes  were  subcloned  into  M13 
sequencing  vectors  MP18  and  MP19. 

Twelve  primers  (rsp-734, 735, 778-787) ,  each  15  bp  in 
length  were  synthesized  corresponding  to  highly  conserved 
regions  of  eukaryotic  small  subunit  ribosomal  DNA 
approximately  300  bp  apart  for  both  strands.  Sequencing 
reactions  were  carried  out  on  pGbr-1000  and  derived 
subclones  in  M13  using  these  twelve  primers  and  the  -40 
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universal  sequencing  primer.  Subclones  in  M13  derived  from 
pGbr-6000  and  lambda  Gb-06  were  sequenced  using  the  -40 
universal  sequencing  primer. 

Initial  dideoxy  sequencing  reactions  (Sanger  et  al. 
1977)  were  performed  using  the  KB  sequencing  kit  (Bethesda 
Research  Laboratories  [BRL]).  Duplicate  reactions  were 
performed  where  needed  to  resolve  regions  of  secondary 
structure.  These  were  accomplished  using  the  Sequenase  kit 
(United  States  Biochemical) ,  the  M13  dideoxy  kit  (New 
England  Biolabs)  and  the  KB  kit  (BRL)  including 
substitutions  of  deaza-7-GTP  and  dITP  for  standard  dGTP. 
Reactions  were  also  modified  by  running  them  at  higher 
temperatures  ranging  from  46-50°  C.  Reactions  were  run  on 
gels  by  two  or  three  successive  loadings  spaced  4-6  hours 
apart.  Initial  gels  were  poured  with  5%  acrylamide  and  7M 
urea  in  IX  TBE  buffer  using  65  cm  X  3  3  cm  plates  and  wedge 
spacers  from  0.4  mm  at  the  top  to  0.8  mm  at  the  bottom. 
These  gels  were  run  at  1800-2400  volts  to  maintain  a  gel 
surface  temperature  of  4  5-50°  C.  Duplicate  gels  were  from 
4-8%  acrylamide  and  8M  urea  in  IX  TBE  buffer.  These  were 
run  at  2000-3000  volts  to  maintain  a  gel  surface 
temperature  of  55-62°  C.  Sequencing  gels  were  exposed  on 
Kodak  XRP-1  film  at  -70°  C.  Sequences  for  Gbr-1000  were 
also  generated  by  automated  methods  using  the  Genesis  2000 
DNA  analysis  system  (Dupont)  according  to  the  manufacturers 
protocol. 
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Gene  Copy  Number  Analysis 

Restriction  digests  of  plasmid,  pGbr-1000  and  pGbr- 
6000  (a  plasmid  subclone  containing  a  -3.5  kb  Hind  III 
fragment  from  Gbr-6700) ,  and  genomic  DNA  for  copy  number 
analysis  were  run  on  1.0%  agarose  gels  in  TBE  buffer.  DNA 
samples  were  quantitated  using  a  Beckman  DU-68  micro- 
spectrophotometer.  Gels  were  blotted  overnight  (Southern 
1975)  on  Gene  Screen  (New  England  Nuclear)  using  0.025  M 
Na2HP04/NaH2P04  (pH  6.5).  Blots  were  exposed  to 
ultraviolet  light  for  6  minutes  to  crosslink  DNA  to  the 
nylon  membrane  then  prehybridized  in  0.5  M  NaPO^,  1%  BSA 
(bovine  serum  albumin)  ,  and  7%  SDS  for  6  hours  at  68°  C. 
Fresh  hybridization  solution  was  added  with  the  nick 
translated  probe  and  incubation  continued  overnight  at  68° 
C.  Blots  were  washed  in  40  mM  NaCl,  40  mM  NaPO^ ,  1  mM 
EDTA,  and  1%  SDS  once  at  room  temperature  for  5  minutes 
then  twice  at  68°  C  for  3  0  minutes  each.  Blots  were 
exposed  to  Kodak  XAR-5  film.  The  optical  densities  of 
autoradiograph  band  areas  were  determined  using  a  laser 
densitometer  (Molecular  Dynamics  model  3  00A)  according  to 
the  manufacturers  protocol . 

Analysis  of  DNA  Sequences 

The  Ginkgo  sequences  were  entered  on  an  IBM  PC-XT 
computer  using  the  Microgenie  software  and  a  Gel  Mate  1000 
sonic  digitizer  (Beckman  Corp.).   The  Microgenie  software 
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was  used  to  overlap  and  merge  gel  sequences.  The  completed 
sequences  were  then  examined  for  base  composition  and 
restriction  sites  using  the  Analysis  programs  of 
Microgenie. 

The  Gbr-1000  coding  region  and  ss  rRNA  sequences  for 
corn,  Zea  mays  (Messing  et  al.  1984),  rice,  Oryza  sativa 
(Takaiwa  et  al.  1984),  soybean.  Glycine  max  (Eckenrode  et 
al.  1985),  a  cycad,  Zamia  pumila  (Nairn  and  Ferl  1988),  and 
the  green  alga  Chlamydomonas  reinhardtii  (Gunderson  et  al. 
1987)  were  entered  and  pairwise  alignments  were  generated 
for  all  six  plant  taxa.  These  alignments  maximize 
similarity  of  the  two  sequences  involved  using  default 
parameters  for  gaps  and  mismatches  (Queen  and  Korn  1984) . 
Sequence  similarities  for  pairwise  alignments  were 
calculated  as  the  number  of  matching  nucleotides  divided  by 
the  length  of  the  alignment  (Table  1) . 

A  multiple  alignment  of  all  six  plant  ss  rRNA 
sequences  was  constructed.  Computer  generated  pairwise 
alignments  were  used  to  first  align  the  five  seed  plant 
sequences,  then  incorporate  the  Chlamydomonas  sequence. 
The  alignment  was  formatted  as  an  input  file  for  the 
Reducseq  program  and  the  output  from  it  subsequently  used 
for  analysis  on  PAUP  (Phylogenetic  Analysis  Using 
Parsimony)  software  (Swofford  1985) . 

In  these  analyses  transitions  and  transversion  are 
given  equal  weight  and  gap  positions  inferred  from  the 
alignment  are  treated  as  missing  data.   The  alltrees  option 
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was  employed  to  conduct  an  exhaustive  search  of  unrooted 
networks  for  the  five  seed  plant  taxa  treating  all 
characters  as  unordered.  The  Chlarovdomonas  sequence  was 
then  restored  to  the  data  matrix  and  designated  as  the 
outgroup.  Exhaustive  searches  of  five  plant  taxa  rooted 
networks  were  conducted  using  global  outgroup  rooting. 

The  total  character  set  (95  positions)  was  divided  into 
subsets  for  further  analysis.  Transition  character  (57) 
and  transversion  character  (38)  subsets  were  analyzed 
separately.  Several  character  positions  contain  both 
classes  of  substitution.  However,  only  one  class  of 
substitution  was  phylogenetically  informative  and  each  of 
these  was  included  in  the  corresponding  subset.  The  Ginkgo 
sequence  was  compared  with  the  secondary  structure  model 
for  the  Zea  mays  ss  rRNA  sequence  (Gutell  et  al.  1985)  and 
characters  were  separated  into  three  subsets.  Characters 
from  umpaired  loop  forming  regions  (38) ,  paired  stem 
forming  regions  (29)  and  those  from  a  region  (ca.  +650-850 
bp)  of  undetermined  secondary  structure  were  analyzed 
separately  using  methods  described  for  the  total  data  set. 

Observed  frequencies  for  the  twelve  possible  types  of 
nucleotide  substitution  were  estimated  for  the  green  plant 
ss  rRNA  sequences.  The  313  variable  positions  were 
correlated  with  tree  I  (Fig.  5)  and  the  minimum  number  of 
observed  substitutions  tabulated.    Observed  substitution 
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frequencies  between  the  two  Ginkgo  sequences  Gbr-1000  and 
Gbr-6700  were  also  estimated. 

The  Gbr-6700  sequence  was  aligned  with  that  of  Gbr- 
1000  and  sequence  similarities  examined.  The  region  of  the 
Gbr-6700  sequence  representing  the  1100  nucleotide  insert 
in  the  ss  rRNA  like  coding  region  was  compared  with 
sequences  from  the  Genbank  sequence  files  using  the  search 
programs  of  Microgenie. 


CHAPTER  4 
PHYLOGENETIC  ANALYSIS  OF  PLANT  SS  RRNA  GENE  SEQUENCES 


Experimental  Results 

Approximately  100,000  recombinant  lambda  clones  were 
screened  on  duplicate  library  lifts  using  the  Zamia 
ribosomal  probes.  The  Sal  I  fragment,  ca.  6.8  kb,  from 
lambda  Gbr-01  subcloned  in  pUC  19,  pGbr-1000,  included  the 
complete  ss  rRNA  coding  region  as  indicated  by 
hybridization  with  separate  5'  and  3'  ribosomal  probes 
(Fig.  1)  .  A  2000  bp  region  spanning  the  coding  region  was 
sequenced.  Boundaries  for  the  ss  rRNA  coding  region  were 
inferred  by  consensus  with  other  eukaryotic  ss  rRNA 
sequences.  The  coding  region  for  the  Ginkgo  ss  rRNA 
inferred  from  this  gene  sequence  is  1811  nucleotides  in 
length.  The  sequence  similarities  for  pairwise  alignments 
of  the  five  seed  plant  sequences  range  from  92%  to  97%  and 
these  are  87%  to  88%  similar  to  the  sequence  for  the  green 
alga  Chlamydomonas  (Table  1) . 

An  alignment  of  the  six  ss  rRNA  sequences  was 
constructed  and  is  1825  positions  in  length  including  gaps 
introduced  to  facilitate  alignment  (Fig.  2)  .  The  majority 
of  positions  for  these  sequences  are  readily  aligned  due  to 
the  relatively  high  sequence  similarities  and  the  similar 
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lengths  of  the  plant  small  subunit  rRNA  sequences. 
However,  alignment  of  some  positions,  primarily  in  the 
Chlamydomonas  sequence,  with  the  other  sequences  remains 
ambiguous.  These  positions  are  typically  observed  in 
regions  with  both  length  variation  due  to  insertional- 
deletional  events  and  low  sequence  similarity  of  positions 
adjacent  to  the  introduced  gaps.  These  positions  were 
deleted  from  the  data  matrix  and  subsequently  treated  as 
missing  data  in  phylogenetic  analyses. 

The  alignment  was  formatted  as  an  input  file  and  run 
through  the  sequence  reduction  program  (Reducseq)  developed 
for  use  with  PAUP  (Swofford  1985) .  This  removes  invariant 
and  uninformative  variable  positions  from  the  data  set.  Of 
the  1825  positions,  313  showed  nucleotide  variation  and  of 
these,  95  positions  were  potentially  phylogenetically 
informative  (Fig.  2).  These  were  used  to  infer 
phylogenetic  trees  using  PAUP. 

An  exhaustive  search  of  possible  unrooted  networks  for 
the  five  seed  plant  sequences  was  conducted.  A  single  most 
parsimonious  network  was  found  (Fig.  3)  which  required  120 
steps  (character  state  changes)  with  a  consistency  index 
value  of  0.867.  The  distribution  of  scores  for  the  15 
possible  network  topologies  was  examined  (Fig.  4)  .  The 
next  best  competitor  requires  an  additional  23  steps  at  143 
with  remaining  network  scores  ranging  to  189  steps.  The 
network  at  143  steps  splits  the  two  grasses  grouping 
soybean  and  corn,  with  rice  in  an  intermediate  position. 
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In  the  network  at  147  steps  corn  and  rice  switch  and  the 
next  two  at  152  and  153  steps  split  Ginkgo  and  the  cycad. 

A  statistical  analysis  was  performed  to  correlate 
character  state  changes  with  network  A  (120)  and  network  B 
(143)  (Table  2)  .  Of  the  95  characters  examined,  33  favor 
one  topology  over  the  other.  The  most  parsimonious  network 
(A)  requires  one  less  substitution  each  for  28  of  these 
positions.  Network  B  is  supported  by  only  five  of  these 
characters.  To  examine  the  significance  of  these  values  a 
two  tailed  sign  test  (Sokal  and  Rolhf  1969)  was  used. 
Confidence  coefficients  were  inferred  from  a  standard 
statistical  table  of  confidence  limits  for  percentages. 
For  a  sample  size  (n)  of  33  with  five  positions  favoring 
network  B  (Y=5)  the  table  indicates  that  the  results  are 
significant  at  the  95%  interval. 

The  outgroup  sequence  from  Chlamydomonas  was  then 
included  and  an  exhaustive  search  for  rooted  trees  was 
conducted.  A  single  most  parsimonious  tree  was  found  that 
requires  149  steps  with  a  consistency  index  value  of  0.78 
(Fig.  5)  .  This  tree  roots  network  A  along  the  branch 
between  the  gymnosperm  node  and  the  angiosperm  node.  The 
next  shortest  tree  requires  an  additional  five  steps  at  154 
and  places  the  root  along  the  branch  between  Zamia  and  the 
gymnosperm  node.  Remaining  tree  scores  range  from  157  to 
236  (Fig.  6) .   The  five  shortest  trees  (149-164)  represent 
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five  of  seven  possible  rootings  of  the  most  parsimonious 
network  found. 

A  statistical  analysis  was  performed  to  correlate 
minimal  substitutions  with  the  most  parsimonious  tree,  I, 
and  the  next  best  competitor,  II  (Table  3) .  The  sign  test 
was  employed  to  examine  the  confidence  limits  for  these  two 
phylogenies.  Thirteen  characters  favor  one  topology  over 
the  other  with  nine  requiring  one  less  step  each  for  tree  I 
and  four  favoring  tree  II  by  one  less  step  each.  Again  the 
table  for  confidence  limits  of  percentages  was  used.  This 
indicates  that  the  number  of  characters  supporting  one 
topology  over  the  other  is  not  significant  at  the  95% 
confidence  interval. 

Five  character  subsets  were  used  to  infer  unrooted 
networks  for  the  ingroup  taxa  and  rooted  phylogenies  using 
the  green  alga  outgroup  sequence.  Each  of  these  subset 
analyses  infer  a  single  most  parsimonious  network  (Fig.  3) 
for  the  ingroup  taxa  identical  in  topology  to  the  most 
parsimonious  network  (A)  found  in  the  analysis  using  all 
characters.  For  each  of  these  subset  analyses  the  next 
most  parsimonious  network  requires  at  least  six  additional 
steps  over  the  most  parsimonious  network  found  and  overall 
patterns  of  network  distributions  were  similar  (Fig.  4)  . 
The  number  of  characters  in  the  data  subsets  ranged  from  28 
to  57.  For  each  of  the  subset  analyses,  network  scores 
were  also  converted  to  percentages  to  compare  network 
distributions  from  analyses  containing  different  numbers  of 


41 


characters.  Distributions  of  network  percentages  (Fig.  7) 
were  used  for  comparisons  but  are  not  intended  to  represent 
a  measure  of  resolution  provided  by  the  character  siibsets. 
These  distributions  reveal  that  for  all  subsets  examined 
the  next  most  parsimonious  networks  range  from 
approximately  10  to  25  percent  longer  than  the  most 
parsimonious  network  found. 

Rooted  phylogenies  were  also  inferred  using  the 
character  subsets.  The  transversion  subset  infers  two 
equally  most  parsimonious  trees,  tree  I  and  tree  IV,  from 
the  data  (Figs.  5  and  6) .  Transition  characters  infer  a 
single  most  parsimonious  tree,  I,  and  the  next  best 
competitor  requires  an  additional  four  steps.  Analysis  of 
loop  positions  from  unpaired  regions  produce  two  equally 
most  parsimonious  trees,  I  and  II.  Stem  positions  from 
paired  regions  produce  a  single  most  parsimonious  tree,  I, 
in  the  analysis  but  the  next  best  competitor,  tree  IV, 
requires  only  one  additional  step.  Remaining  characters 
from  regions  of  undetermined  secondary  structure  also 
produce  a  single  most  parsimonious  solution,  tree  I,  in 
this  analysis  with  the  next  most  parsimonious  tree,  II, 
requiring  two  additional  steps. 

Observed  frequencies  of  the  twelve  possible  nucleotide 
substitutions  for  the  plant  ss  rRNA  sequences  were 
examined.  These  substitution  frequencies  estimate  the 
minimum  number  of  substitutions  for  the  313  variable 
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positions  required  to  correlate  the  data  with  tree  I  found 
in  the  analysis.  For  this  analysis,  transitional 
substitutions  are  observed  at  a  higher  frequency  than 
transversions  (Table  4) .  Substitutions  between  C  and  T  are 
most  frequent,  18-20%  of  the  total,  and  are  observed  at 
approximately  twice  the  frequency  of  substitutions  between 
A  and  G  which  represent  8.5-10%  of  total  substitutions. 
The  eight  possible  transversion  types  are  observed  at 
similar  frequencies  ranging  from  approximately  4-7%  of  the 
total  number  of  substitutions  observed. 

Congruence  of  seed  plant  phylogenies  inferred  from  ss 
rRNA  sequences  and  those  inferred  from  5S  rRNA  sequences 
was  examined.  Six  plant  5S  rRNA  sequences  were  selected 
from  those  available  which  represent  the  same  taxonomic 
groups  as  those  available  for  ss  rRNA.  The  5S  sequences 
are  available  for  Ginkgo  and  the  outgroup  Chlamydomonas . 
The  5S  rRNA  sequences  for  the  remaining  four  species  in  the 
ss  rRNA  analysis  are  not  available.  Therefore  the  5S 
sequences  for  Cycas  revoluta .  Vicia  faba,  Secale  cereale, 
and  Triticum  aestivum  (Hori  et  al.  1985)  were  used  in  place 
of  Zamia  pumila.  Glycine  max.  Orvza  sativa.  and  Zea  mays. 

The  cladistic  parsimony  methods  described  for  the  ss 
rRNA  analysis  were  applied  to  these  5S  rRNA  data.  The 
alignment  of  the  5S  rRNA  sequences  is  120  nucleotide 
positions  in  length  and  of  these,  twelve  positions  are 
potentially  phylogenetically  informative.  A  single  most 
parsimonious  network  was  inferred  from  the  5S  rRNA  data 
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which  required  13  steps  (consistency  1.0)  and  is  the  same 
topology  as  network  A  inferred  from  the  ss  rRNA  data.  The 
next  best  competitors  require  an  additional  five  steps  and 
there  are  two  of  these  at  18  steps.  For  remaining  networks 
there  are  two  at  19  steps  and  ten  at  25  steps  (Fig.  8)  . 
The  most  parsimonious  rooted  tree  using  the  Chlamydomonas 
sequence  as  the  outgroup  requires  14  steps  and  is  identical 
in  topology  to  tree  IV  inferred  from  ss  rRNA  data  at  160 
steps  (  +  11)  .  This  tree  roots  network  A  between  the  node 
for  the  dicot  sequence  and  the  node  for  the  two  monocots. 
The  next  best  competitor  at  15  steps,  requires  only  one 
additional  step  and  is  the  same  topology  as  the  most 
parsimonious  rooted  tree,  I,  found  in  the  ss  rRNA  analysis. 
The  5S  rRNA  sequence  from  a  more  closely  related 
outgroup  is  available  for  the  fern  Dryopteris  acuminata. 
The  root  placement  using  this  sequence  in  place  of  the 
algal  sequence  infers  a  single  most  parsimonious  tree  with 
the  same  topology  as  tree  I  (Fig.  5)  found  in  this  analysis 
of  ss  rRNA  sequences. 
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Figure  2.  An  alignment  of  ss  rRNA  gene  sequences  from  five 
seed  plant  species  and  a  green  alga.  Numbers  to 
the  left  of  the  alignment  indicate  the  species 
included  as  follows:  1)  Chlamydomonas 
reinhardtii;  2)  Ginkgo  biloba;  3)  Zamia  pumila; 
4)  Glycine  max;  5)  Oryza  sativa;  6)  Zea  mays. 
Positions  of  the  alignment  are  indicated  above 
the  alignment  by  numbers  every  100  bp  and 
vertical  lines  every  50  bp.  Positions  of  the 
alignment  that  were  deleted  a  priori  from  the 
data  matrix  are  underlined.  Character  positions 
are  indicated  by  a  *  and  synapomorphic 
insertions  by  a  ^  above  the  alignment.  Gaps 
introduced  in  individual  sequences  to  facilitate 
alignment  are  indicated  within  the  sequence  by  a 
hyphen  (-) . 
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I  •  •       *»  100 

1    C A GC..-^ 

2  TACCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTGTAAGTATGAACTCTTTCAGACTGTGAAACTGCGAATGGCTC 

3   C A...TG...G 

4  ...T AA 

5   C AA...GA 

6   C AA.  ..GA 


1    A -X^ -G..C.A T 

2  ATTAAATCAGTTATAGTTTCTTTGATGGTACCTTACTACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTGCACCAAATCCCGACTTCTGGAAGGGA 

3   TC.G T.T 

4   G -TC. A C 

5   G -.G.G A C C..G G 

6 G -.G.G A C C..G G 

•  •***!•  *  ••  *300 

1  ..T -.GC T -AC..G. A TC...A T.T.  .^G^.C.  .C. .  .A. .  .T.T 

2  CGCATTTATTAGATAAAAGGCCGACGCGGGCTC-GCCCGCTGCTTCGGTGATTCATGATAACTCGACGGATCGCACGGCCCTGGTGCCGGCGACGCTTCA 
3 C T TT G.CG..T A C..T..T T...T C.A 

4  T T.A..A.A T...T.T T.A T T.T A... 

5   T C A. C.A C A... 

6 T T C.A.  C.A T T.C A... 

•  I  400 

1    A G 

2  TTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGAGGCCTACCATGGTGGTGACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCT 

3   C C 

4   T 

5   G 

6  G 

I  *         *     * 

1    G.T C - C G.-T. 

2  GAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCAATCCTGACACGGGGAGGTAGTGACAATAA-ATAACAATACTGGGCTC-AT 

3   - -.. 

4   - C -.. 

5    T C G.-T. 

6   - C G.GT. 

*  *  I  600 

1  ..C - 

2  CGAGTCTGGTAATTGGAATGAGTACAATCTAAATCCCTTAACGA-GGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAAT 

3  - 

4  T T A 

5  A.T - 

6  A.T - 


1    T.C...T-G G.TG ..C--- T.CT.TG-CT.CA.. 

2  AGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATCTTGGGC-CGGGTCGGCCGGTCCGCCTTTT-CGGTGTGCACCGGCCGC-TCCGTCC 

3   A-...CC T TG T-.T 


4   C T-T AT ..C---. T..G-CT 

5   C ^G...CCG ..CA--...CAG A..TG-CT.  .A.. 

6   C ^CG..CCG..TGCCG .GTA-- CA.A A..  .GTCT.  .A. . 

**  ****|*  *  **800 

1  T.C G.A..G G...C.C..T -A.T...AGT AG GT T G 

2  CTTCTGCCGGCGGCGCGCTCCTGGCCTTAATTGGCTGGG-TCGCGGCTCCGGCGCCGTTACTTTGAAAAAATTAGAGTGCTCAAAGCAAGCCTACGCTCT 

3  ...T..TT A C.T - T A...T T..T 

4 

5    

6 


AT 

T... 

...C. 

..c. 

_ 

..T.C... 

T 

.T 

G 

AT 

...c. 

..c. 

T 

..T.C... 

G 

AT 

AT 

...c. 

..c. 

..T.C... 

. 

G 

AT 
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*  *  ***  *  I  *  •  *  9QQ 

1    A C - -.C T..GT G GTA 

2  GAATACATTAGCATGGAATAACGCGATAGGAGTCTGGTCCTATTGTGTTGGCCTTCGGGACCGGAGTAATGATTAATAGGGACGGTCGGGGGCATTCGTA 

3   GT T C C 

4  .T G A.C.C T A T C A 

5  .G G ATC T..C - T A 

6  .G G ATC T..C T A 

*  *         I  1000 

1  ..C.G C.G AT AC G 

2  TTTCATTGTCAGAGGTGAAATTCTTGGATTTATGAAAGACGAACCACTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGG 

3  

4 A A 

5  A A 

6 A A 

I*  *  **  .  1100 

1    T G T...A CT...G.T T...A.- A 

2  CTCGAAGACGATCAGATACCGTCCTAGTCTCAACCATAAACGATGCCGACTAGGGATCGGCGGATGTTGCTTTAAGGACTCCGCCGG-CACCTTGTGAGA 

3  C C - A 

4  C A T T..- A 

5  C AT - A 

6  C A - A..AAT C T..C A 

*  I  *  1200 

1    C 

2  AATCAAAGTTTTTGGGTTCCGGGGGGAGTATGGTCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATT 

3  

4  C - 

5   C C....G 

6 C C 

*  I     *  1300 

1    CG.G G 

2  TGACTCAACACGGGGAAACTTACCAGGTCCAGACATAGTAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCTATGGGTGGTGGTGCATGGCCGTTCTT 

3  C 

4   - C 

5   C C 

6  C C T 

I  *  *******  1400 

1    GTT.CC A G G A. .  ■- .CA.  .ATC.CAC.TG.GGT. -C. .  .GA 

2  AGTTGGTGGAGCGATTTGTCTGGTTAATTCCGTTAACGAACGAGACCTCAGCCTGCTAACTAGCTATGCGGAGGTTCGCCTTCGTGGCCAGCTTCTTAGA 

3  G C G.TT.T 

4   C A T AAC...C.AC 

5   CCATC...C..CA..T 

6   CCATC...C...A.TT 

I 

1    T -G..T..C.A A G C.CGAC. 

2  GGGACTA-TGGCCCTTCAGGCCATGGAAGTTTGAGGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGATGTATTCA 

3  - G..T 

4  - GC.T C 

5  - G..T C - C 

6 - G..T GC- C. 


Figure  2 — continued 
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1    C --...T T IG-T...CCG.G.- T AG C G. 

2  ACGAGTCTATAACCTGGGCCGAGAGGCCCGGGAAATCTGCCGAAATTTCAT-CGTGATGGGGATAGATCATTGCAATTATTGATCTTAAACGAGGAATTC 

3   G T G - C 

4   G...T C...T T TT- - GT G...G C 

5   A G T C T TGG - AC...G...G C G. 

6 A G...T C --..T TGG G G...G C G. 

I  *  1700 

1    G T GG..TG.T 

2  CTAGTAAGCGCGAGTCATCAACTCGCGTTGACTACGTCCCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAATGATCCGGTGAAGTGTTCGG 

3  G 

4  G G 

5  G G 

6 G G 

*****  *  1800 

1  ■■T.A..--TT.G.T.G..--.AA.-.T...CT-T.--.T-T A C.CC..CC 

2  ATCGCGCCGACGACGGCGGTTCGCCGCCGGCGA-CGT-CG-CGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGT 
3 T T.GGCA... -...-..- 

4  ..T...G TGA T...C -...-T.-T C.C 

5   G GG CC... -...-..- C 

6  .G.T..G.CG.AC--- CC...C...C..C C 

1825 

1    

2  AGGTGAACCTGCGGAAGGATCATTG 

3  

4  

5  

6  


Figure  2 — continued 
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Figure  3.  The  fifteen  possible  topologies  for  unrooted 
networks  of  the  five  seed  plant  taxa.  The 
number  of  steps  (character  state  changes) 
required  for  each  network  are  indicated  below 
the  network  for:  a)  all  characters;  b) 
transversion  characters;  c)  transition 
characters;  d)  loop  characters;  e)  stem 
characters;  f)  undetermined  secondary 
structure  position  characters.  Taxa  included 
are:  Ginkgo  biloba  (G.b.)  ;  Zamia  pumila 
(Z.p.)  ;  Glycine  max  (G.m.)  ;  Oryza  sativa 
( 0 . s . ) ;  and  Zea  mays  (Z.m.) . 
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Figure  3 — continued 


Figure  4.  The  distributions  of  network  scores  (Fig.  3). 
The  horizontal  axis  represents  the  number  of 
steps  required  for  each  network  and  the 
vertical  axis  indicates  the  number  of  networks 
found  requiring  that  number  of  steps.  The  data 
set  used  for  each  analysis  and  the  number  of 
characters  in  the  data  set  are  listed  to  the 
right  of  the  histograms. 
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Figure  5.  The  most  parsimonious  tree  (I)  and  the  three 
next  most  parsimonious  trees  (II-IV)  found  in 
the  analysis  using  the  green  algal  sequence  as 
the  outgroup.  Numbers  next  to  trees  indicate 
the  number  of  steps  for  each  internode 
inferred  from  the  total  data  set.  Represented 
taxa  are:  Ginkgo  biloba  (G.b.)/'  Zamia  pumila 
( Z . p . ) ;  Glycine  max  ( G . m . ) ;  Oryza  sativa 
(O.S.);  Zea  mays  (Z.m.)'*  and  Chlamydomonas 
reinhardtii  (C.r.)«  Numbers  listed  below  the 
trees  indicate  the  number  of  steps  required 
for  each  of  the  trees  from:  a)  total  data;  b) 
transversions;  c) transitions ;  d)  loop 
positions;  e)  stem  positions;  and  f) 
undetermined  secondary  structure  positions. 
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a.  149 

b.  60 

c.  89 

d.  59 

e.  44 

f.  46 


II. 

a.  154 

b.  61 

c.  93 

d.  59 

e.  47 

f.  48 
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Figure  5 — continued 


a.  160 

b.  60 

c.  100* 

d.  65 

e.  43 

f.  52 


Figure  6.  Distribution  of  scores  for  the  105  possible 
topologies  for  six  taxa  trees.  The  horizontal 
axis  represents  the  number  of  steps  required 
for  trees  and  the  vertical  axis  indicates  the 
number  of  trees  found  for  each  score.  Numbers 
within  the  open  boxes  at  the  top  of  the 
histograms  indicate  the  percentage  of  steps 
required  for  each  tree  over  the  most 
parsimonious  tree  found  for  each  character 
set.  The  character  set  used  for  each  analysis 
is  indicated  to  the  right  of  the  histograms. 


58 


a. 

All  Positions 
95  Characters 


140  150  160  170  180  190  200  210  220  230  240 


ir 

10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 


I    T    I    I    T    I    I    I    I    I    I    T    T    I    I    I    I    I    I 


r  T  I  I  I  I  1  1  1  I  !  I  I  I 


5    10 


20? 


50% 


T  "T    1    T   rT~rT-TTT  I  Tt 


U 


Jl 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


Trcnsversions 
38  Characters 


50  60  70  80  90  100  110 


10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 


I   rr  t  t  T  T  1  I   1   r   I   I  I 


■   T   T   r  T    T    fT    T   I    T    I    r   T    1    I    r    I    I    I    I    I 


'  I  "  '. 


5%  10% 

20% 

50% 

XJl 


ttiTtiiTTimiTTTiiTi TTTTTt 


k 


w 


90 


100 


10  120 


lU 


130 


Transitions 
57  Characters 


40 


59 


d. 

Loop  Posiitions 
38  Characters 


100 


ir 

10 
9 
8 
7 
5 
5 
4 
3 
2 
1 
0 


I     I     I     I     I     1     I     r     I     1     I     1     I    I     r    T    T    T    T     r    T    r    I     I    [    I    T 


5  10  20% 

I 1 L 


40 


50% 


50 


TT  f  T  I  TT  T  TTTt 


60 


70 


r^ 


I   I    I   I    I    I    I   I    I   I — rn — TT" 


e. 

Stem   Positions 
29  Cinaracters 


80 


1  1 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 


_i  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I 


40 


5 

10  20% 

50% 

in 


I 


50 


I     <     I     I     I     I     I     I     T     I 


f. 

Undetermined 

Structure 
28  Characters 


60 


70 


80 


Figure  6 — continued 


Figure  7.  The  distribution  of  network  percentages  for  each 
of  six  character  subsets.  Percentages  were 
calculated  for  network  scores  by  dividing  the 
number  of  additional  steps  required  for  each 
network  over  the  most  parsimonious  network  by 
the  number  of  steps  required  for  the  most 
parsimonious  network.  The  horizontal  axis 
represents  these  percentages  and  the  vertical 
axis  indicates  the  number  of  trees  found  within 
each  percentage  interval.  The  character  set 
used  for  each  analysis  is  shown  to  the  right  of 
the  histograms. 
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Figure  8 .  Distributions  for  unrooted  networks  and  rooted 
phylogenetic  trees  inferred  from  this  re- 
analysis  of  5S  rRNA  sequences  from  selected 
taxa.  The  horizontal  axis  represents  the 
number  of  character  state  changes  for  each 
network  or  tree  and  the  vertical  axis  indicates 
the  number  of  these  found  in  the  analysis.  The 
histograms  represent  scores  for:  a)  unrooted 
five  taxa  networks;  and  rooted  trees  using  b) 
the  green  alga  sequence  and  c)  the  fern 
sequence  as  the  outgroup. 
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Table  1.   Sequence  Similarities  for  Plant  ss  rRNA  Genes 

G.b.  Z.p.       G.m.  O.s.  Z.m. 

C.r.     88.4  87.3       86.8  87.3  86.6 

G.b.  95.9       94.0  94.0  92.4 

Z.p.  92.3  92.3  91.0 

G.m.  94.5  93.5 

O.s.  96.9 


Taxa  included  are:  Chlamydomonas  reinhardtii  (C.r.); 
Ginkgo  biloba  (G.b.) ;  Zamia  pumila  (Z.p.) ;  Glycine  max 
(G.m. ) ;  Oryza  sativa  (O.s.) ;  and  Zea  mays  (Z.m.) . 
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Table  2.  Characters  for  Statistical  Analysis  of  Networks 


CHARACTER 

CHARACTER 

NUMBER 

STATE 

G 

Z 

G 

0  Z 

b 

P 

m 

s  m 

1 

T 

T 

T 

C  C 

4 

G 

G 

G 

A  A 

8 

T 

T 

T 

C  C 

9 

A 

A 

A 

G  G 

10 

A 

A 

A 

G  G 

11 

C 

C 

C 

T  T 

12 

C 

G 

C 

A  A 

13 

T 

T 

T 

C  C 

19 

C 

C 

T 

C  T 

23 

T 

T 

T 

G  G 

24 

A 

A 

A 

T  T 

26 

A 

A 

A 

T  T 

31 

C 

C 

C 

G  G 

32 

C 

C 

T 

C  T 

33 

G 

G 

G 

A  A 

37 

T 

T 

T 

A  A 

46 

T 

T 

T 

A  A 

47 

A 

A 

A 

T  T 

51 

C 

G 

C 

T  T 

53 

T 

T 

T 

C  C 

60 

G 

G 

A 

G  A 

61 

T 

T 

T 

A  A 

63 

C 

C 

T 

C  T 

65 

A 

A 

A 

C  C 

69 

G 

G 

G 

C  C 

70 

T 

G 

T 

C  C 

74 

G 

G 

G 

A  A 

75 

C 

C 

C 

T  T 

77 

T 

T 

T 

C  C 

78 

C 

C 

C 

A  A 

80 

G 

G 

T 

G  T 

88 

T 

T 

T 

G  G 

94 

G 

A 

G 

C  C 

NUMBER  OF 
SUBSTITUTIONS 


NTW  A 

1 
1 
1 
1 
1 
1 
2 
1 
2 
1 
1 
1 
1 
2 
1 
1 
1 
1 
2 
1 
2 
1 
2 
1 
1 
2 
1 
1 
1 
1 
2 
1 
2 


NTW  B 

2 

2 
2 
2 
2 
2 
3 
2 
1 
2 
2 
2 
2 
1 
2 
2 
2 
2 
3 
2 
1 
2 
1 
2 
2 
3 
2 
2 
2 
2 
1 
2 
3 


SIGN  TEST 
SCORE 


NTW  A    NTW  B 

+  1 
+  1 
+  1 
+  1 
+  1 
+  1 
+  1 
+  1 


+  1 
+  1 
+  1 
+  1 

+  1 
+  1 
+  1 
+  1 
+  1 
+  1 

+1 

+  1 
+  1 
+  1 
+  1 
+  1 
+  1 
+  1 

+  1 
+  1 


-1 


-1 


■1 

■1 


-1 


Number  of  Characters  Supporting  Each  Network  +28      -5 

Total  Non-zero  Scores  33 

Character  states  for  positions  favoring  one  topology 
over  the  other  for  networks  A  and  B  (Fig.  3).  A  standard 
statistical  table  was  used  to  interpolate  confidence 
intervals  for  the  sample  size  (n)  of  33  with  five  (Y) 
positions  supporting  the  alternative  network  B  over  the 
most  parsimonious  network  A. 
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Table  3.  Characters  for  Statistical  Analysis  of  Phyloqenies 


CHARACTER 

CHARACTER 

NUMBER 

OF 

NUMBER 

STATE 

SUBSTITUTIONS 

C 

r 

g 

Z 
P 

G 
m 

0 

s 

Z 
m 

TREE 

I 

TREE  II 

16 

A 

T 

A 

T 

T 

T 

2 

1 

17 

T 

C 

T 

C 

C 

T 

3 

2 

18 

T 

C 

T 

C 

C 

C 

2 

1 

22 

C 

T 

T 

c 

c 

C 

1 

2 

36 

C 

T 

T 

c 

c 

C 

1 

2 

38 

A 

G 

G 

A 

A 

A 

1 

2 

40 

T 

G 

T 

G 

G 

G 

2 

1 

50 

A 

G 

G 

A 

A 

A 

1 

2 

56 

A 

G 

G 

A 

A 

A 

1 

2 

58 

A 

C 

C 

A 

A 

A 

1 

2 

80 

T 

G 

G 

T 

G 

T 

2 

3 

83 

T 

A 

A 

T 

T 

T 

1 

2 

87 

G 

A 

A 

G 

G 

G 

1 

2 

SIGN  TEST 
SCORE 


TREE  I   TREE  II 


+  1 
+1 
+  1 

+  1 
+  1 
+  1 
+  1 

+  1 
+  1 


■1 
■1 
•1 


-1 


Number  of  Characters  Supporting  Each  Tree 


+9 


-4 


Total  Non-zero  Scores 


13 


Character  states  for  positions  favoring  one  topology 
over  the  other  for  trees  I  and  II  (Fig.  5)  .  A  standard 
statistical  table  was  used  to  interpolate  confidence 
intervals  for  the  sample  size  (n)  of  13  with  four  (Y) 
positions  favoring  the  alternative  phylogeny  II  over  the 
most  parsimonious  phylogeny  I. 
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Table  4.  Substitution  Frequencies  for  Plant  ss  rRNA  Genes 


Subst. 
Type 

A  >  C 

A  >  T 

C  >  G 

G  >  T 

A  >  G 

C  >  T 


No.  of    %  of 
Subst.   Subst. 


15 
18 
25 
23 
35 
82 


3.7 
4.4 
6.1 
5.7 
8.6 
20.1 


Subst. 

No. 

of 

%  of 

Type 

Subst. 

Subst. 

C  >  A 

17 

4.2 

T  >  A 

20 

4.9 

G  >  C 

30 

7.4 

T  >  G 

28 

6.9 

G  >  A 

42 

10.3 

T  >  C 

72 

17.7 

The  observed  frequencies  for  nucleotide  substitution 
types  were  estimated  from  variable  positions  of  the  plant 
ss  rRNA  gene  alignment  (Fig.  2).  Substitutions  were 
correlated  with  the  most  parsimonious  tree  (Fig.  5)  found 
in  the   analysis. 
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Discussion 


Introduction  to  Discussion 

The  length  of  the  Ginkgo  small  subunit  rRNA  coding 
region  contained  in  Gbr-1000  is  1811  nucleotides  in  length 
as  determined  by  consensus  with  other  eukaryotic  sequences. 
This  is  within  the  range  reported  for  other  seed  plants 
which  range  from  1807  nucleotides  in  soybean  to  1813 
nucleotides  for  Zamia.  Length  conservation  coupled  with 
high  sequence  identity  makes  alignment  of  most  nucleotide 
positions  straightforward.  Only  a  few  positions  which 
surround  gaps  in  the  alignment  and  show  nucleotide 
variation  are  not  so  easily  aligned  (Fig.  2) . 

A  basic  assumption  in  a  phylogenetic  analysis  is  that 
the  characters  being  compared  are  homologous  (share  a 
common  evolutionary  origin) .  In  a  cladistic  study  using 
molecular  sequences,  homology  of  a  nucleotide  position  is 
defined  by  the  alignment.  For  some  nucleotide  positions 
where  more  than  one  alignment  is  possible,  misalignment 
will  result  in  comparing  non-homologous  characters  in  the 
analysis.  To  minimize  this,  several  nucleotide  positions 
were  removed  from  the  data  matrix  (Fig.  2)  ,  mostly  in  the 
Chlamydomonas  sequence,  where  alignment  is  ambiguous  and 
consequently  homology  is  uncertain  for  those  characters. 
The  remaining  nucleotide  positions  include  95  potentially 
phylogenetically  informative  positions. 
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For  this  study  we  adopt  a  cladistic  approach  using 
parsimony.  Eukaryotic  relationships  have  been  examined 
based  on  ss  rRNA  sequences  using  a  variety  of  approaches 
(reviewed  in  Felsenstein  1988),  however,  the  cladistic 
parsimony  approach  has  become  widely  used  by  plant 
systematists . 

Inqroup  Analysis  of  Seed  Plant  ss  rRNA  Secpjences 

Ingroup  analysis  of  the  five  seed  plant  taxa  infers  a 
single  most  parsimonious  network  (Fig.  3)  .  This  network, 
A,  is  120  steps  in  length  and  has  a  consistency  index  of 
0.87.  Network  A  groups  the  two  grasses  together  with  the 
soybean  sequence  intermediate  between  the  grass  node  and 
the  node  grouping  Ginkgo  and  Zamia.  The  next  shortest 
network,  B,  requires  23  additional  steps  at  143  and  splits 
the  two  grasses  placing  soybean  with  corn  and  placing  rice 
intermediate  between  these  and  the  gymnosperms.  The  four 
next  most  parsimonious  networks  (B,  C,  D  and  E)  requiring 
between  14  3  to  153  steps  split  either  the  grass  node  (B  and 
C)  or  the  gymnosperm  node  (D  and  E)  grouping  soybean  with 
one  member  of  the  group  and  placing  the  other  member  of  the 
group  in  an  intermediate  position  in  the  network. 
Remaining  arrangements  of  the  five  ingroup  sequences  (F 
through  0)  require  between  180  and  189  steps  (Fig.  4) . 

A  statistical  estimate  of  the  significance  of 
character  support  for  the  most  parsimonious  network  (A) 
over  the  next  best  competitor  (B)  was  calculated.    An 
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estimate  of  the  minimum  number  of  substitutions  was  made  by 
correlating  each  character  position  with  network  A  and  with 
network  B  (Table  2)  •  Of  the  95  characters,  3  3  favor  one 
network  over  the  other  by  one  less  substitution  each. 
Characters  favoring  network  A  totalled  28  with  only  five 
favoring  network  B. 

The  nonparametric  sign  test  (Sokal  and  Rolhf  1969)  was 
used  to  examine  confidence  limits  for  percentages. 
Calculations  were  made  for  a  sample  size  (n)  of  33  with 
only  five  characters  (Y)  supporting  the  alternative  network 
topology  using  a  two  tailed  test.  Results  indicate  that  at 
the  5%  significance  level  it  is  unlikely  that  the  number  of 
characters  supporting  the  two  alternatives  are  equal  as 
postulated  by  the  null  hypothesis. 

The  most  parsimonious  network,  A,  inferred  from  the  ss 
rRNA  sequences  is  highly  congruent  with  the  phenotypic 
data.  Other  possible  topologies  (B-0)  for  these  taxa  are 
not  plausible  alternatives  given  the  morphological  and 
anatomical  characters  for  these  seed  plants.  The  strength 
of  support  for  network  A  from  these  data  is  therefore 
consistent  with  the  assumption  that  ss  rRNA  sequences 
contain  phylogenetically  informative  characters  applicable 
to  study  at  these  taxonomic  levels. 
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Analysis  of  Rooted  Phylogenetic  Trees 

Rooted  phylogenetic  trees  were  inferred  from  the  data 
using  the  Chlamydomonas  sequence  as  the  outgroup  (Fig.  5) . 
The  green  algae  are  widely  supported  as  a  basally  divergent 
group  in  green  plant  lineage  and  for  ss  rRNA  sequences 
currently  available  is  the  closest  relative  of  the  higher 
seed  plants.  An  exhaustive  search  of  rooted  phylogenies 
was  conducted  and  inferred  tree  distributions  were  examined 
(Fig.  6)  . 

A  single  most  parsimonious  tree  is  found  which 
required  149  steps  and  has  a  consistency  index  value  of 
0.78  (Fig.  5) .  In  tree  I  the  algal  sequence  roots  network 
A  (Fig.  3)  along  the  internode  between  the  two  gymnosperms 
and  the  angiosperm  node.  Therefore  a  monophyletic  grouping 
of  Ginkgo  and  Zamia  is  inferred  from  the  most  parsimonious 
rooting  of  seed  plant  networks. 

The  next  two  shortest  trees  found,  tree  II  and  tree 
III,  require  an  additional  five  and  eight  steps 
respectively  (Fig.  5) .  These  support  a  polyphyletic  origin 
for  these  two  gymnosperm  taxa.  Tree  II,  which  requires  154 
steps,  roots  network  A  along  the  branch  leading  to  Zamia 
suggesting  that  Ginkgo  is  more  closely  related  to  the 
angiosperms  than  the  cycads.  Tree  III  (Fig.  7)  ,  at  157 
steps,  roots  the  network  along  the  Ginkgo  branch  and  this 
placement  of  the  root  suggests  that  Ginkgo  diverged  prior 
to  the  divergence  of  the  cycad  lineage  from  that  of  the 
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angiosperms.  The  five  shortest  trees  found  in  this 
analysis  (Fig.  6)  represent  five  of  seven  possible  rootings 
of  the  most  parsimonious  network,  A,  found  in  the  ingroup 
analysis. 

A  statistical  analysis  was  performed  by  calculating 
the  minimum  number  of  nucleotide  substitutions  for  each 
character  position  required  to  correlate  the  sequence  data 
with  the  most  parsimonious  tree,  I,  and  the  next  best 
competitor  tree  II.  Of  95  character  positions  in  the  data 
13  favor  one  phylogeny  over  the  other  by  one  less 
substitution  each  (Table  3) .  Nine  of  these  support  tree  I 
and  four  support  tree  II  (Fig.  5)  .  The  sign  test  was 
employed  to  examine  the  significance  of  these  values.  The 
test  indicates  that  support  for  one  phylogeny  over  the 
other  is  not  significant  at  the  95%  confidence  interval. 
Therefore,  although  a  monophyletic  group  is  inferred  by 
using  a  parsimony  criterion,  the  ss  rRNA  data  using  the 
algal  outgroup  support  the  grouping  rather  weakly. 
Resolution  for  rooted  phylogenies  is  not  as  clear  as  that 
obtained  in  the  ingroup  analysis.  These  results  suggest 
that  rates  of  evolution  in  ss  rRNA  sequences  may  be  too 
rapid  to  provide  informative  characters  for  such  distantly 
related  taxa  (eg.  green  algae  and  seed  plants) , 
particularly  if  time  between  divergence  of  ingroup  taxa 
(eg.  Ginkgo  and  cycads)  is  brief.    Therefore,  sequences 
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from  more  closely  related  outgroup  taxa  are  needed  in  order 
to  root  seed  plant  phylogenies  inferred  from  ss  rRNA 
sequences . 

Comparison  of  Character  Subsets 

Character  weighting  has  frequently  been  used  in 
phylogenetic  analyses.  Weighting  based  on  transition  and 
transversion  types  of  substitutions  can  be  examined  for 
molecular  characters.  In  rRNA  sequences  the  position  of 
the  character  in  either  an  unpaired,  loop  region  or  a 
paired,  stem  forming  region  can  also  be  used  as  a  basis  for 
character  weighting.  Compensating  mutations  in  stem 
forming  regions  occur  at  high  frequency  and  may  be  under 
positive  selection  pressure.  Studies  using  5S  and  5.8S 
rRNA  sequences  have  suggested  that  it  may  be  desirable  to 
give  less  weight  to  compensating  mutations  or  eliminate 
them  from  the  analysis  (Wheeler  and  Honeycutt  1988) . 

Wheeler  and  Honeycutt  (1988)  examined  phylogenies 
inferred  from  subsets  of  characters  based  on  stem  or  loop 
regions  of  the  rRNA  molecule  and  from  their  total  data. 
This  approach  was  used  to  examine  subsets  of  characters 
from  the  ss  rRNA  data.  The  95  characters  were  separated 
into  subsets  representing  different  types  of  characters  and 
phylogenetic  inferences  made  using  these  subsets. 

Characters  were  first  separated  based  on  nucleotide 
substitution  classes  into  subsets  containing  transversions 
(38  characters)  and  transitions  (57  characters) .   The  data 
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were  also  divided  based  on  the  position  within  a  secondary 
structure  model  into  loop  region  characters  (38) ,  stem 
region  characters  (29)  and  remaining  characters  (28)  from 
regions  where  secondary  structure  models  have  not  been 
determined. 

The  ingroup  analysis  from  each  subset  supports  a 
single  most  parsimonious  network  identical  in  topology  to 
the  most  parsimonious  found,  network  A,  using  the  total 
data  set  (Fig.  3) .  Direct  comparison  of  these  analyses  is 
difficult  because  of  the  variation  in  the  number  of 
characters  each  subset  contains.  Network  distributions 
(Fig.  4)  for  each  subset  analysis  were  converted  into 
percentages  reflecting  the  number  of  additional  steps 
required  for  alternative  networks  over  the  most 
parsimonious  network  (Fig.  7).  These  percentage 
distributions  are  used  to  compare  subset  analyses  but  are 
not  intended  as  a  quantitative  measure  of  distribution 
significance. 

The  analysis  of  transversion  characters  supports  a 
single  most  parsimonious  network  (A)  which  requires  45 
steps  for  38  characters  (Fig.  3) .  The  consistency  index  is 
0.96  and  the  next  most  parsimonious  networks  (B  and  E)  each 
require  an  additional  13  steps.  For  transition  data  the 
most  parsimonious  solution  requires  75  steps  for  57 
characters  (network  A).  The  consistency  index  is  0.81  for 
the  network  and  the  next  best  competitor  (network  B) 
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requires  an  additional  10  steps.  Consistency  index  values 
for  most  parsimonious  solutions  suggest  that  transversion 
characters  contain  less  homoplasy  than  the  transition 
characters  and  may  therefore  provide  better  resolution  than 
transitions  or  total  characters  when  both  types  of 
substitutions  are  given  equal  weight.  Network  percentage 
distributions  (Fig.  7)  indicate  that  for  all  characters  the 
next  most  parsimonious  network  is  19%  longer  than  the  most 
parsimonious  network.  For  the  transversion  analysis  the 
next  competitor  is  29%  longer  than  the  most  parsimonious 
solution  and  the  transition  subset  supports  a  next  most 
parsimonious  network  13%  longer  than  the  most  parsimonious 
found.  These  results  suggest  that  giving  more  weight  to 
transversion  substitutions  may  reduce  the  amount  of 
homoplasy  in  the  ss  rRNA  sequence  data  and  increase 
resolution.  However,  results  also  indicate  that  transition 
substitutions  provide  informative  characters  and  therefore 
should  be  included  at  some  level  in  phylogenetic  analyses. 

The  analyses  of  character  subsets  based  on  secondary 
structure  of  the  ss  rRNA  molecule  reveal  less  variation  in 
consistency  index  values  and  network  distributions  (Fig. 
4)  .  Loop  position  characters  infer  a  most  parsimonious 
network  (A)  requiring  50  steps  for  38  characters.  Next 
most  parsimonious  alternatives  (B  and  D)  each  require  an 
additional  11  steps.  The  consistency  index  for  network  A 
from  loop  positions  is  0.86  compared  with  0.85  obtained  for 
stem  positions.    Therefore  this  analysis  indicates  that 
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these  two  types  of  characters  contain  similar  levels  of 
homoplasy.  The  most  parsimonious  network  for  29  stem 
position  characters  requires  34  steps  with  the  next  best 
competitor  requiring  an  additional  six  steps.  Next  most 
parsimonious  solutions  for  these  data  subsets  are  22%  and 
18%  longer  than  most  parsimonious  networks  for  loop 
characters  and  stem  characters  respectively  (Fig.  7)  . 
Remaining  positions  from  a  region  of  undetermined  secondary 
structure  also  produce  a  single  most  parsimonious  network 
(A)  at  3  6  steps  for  2  8  characters.  The  consistency  index 
for  the  network  is  0.89,  slightly  higher  than  those  from 
loop  and  stem  character  subsets. 

Analyses  of  phylogenies  using  the  algal  sequence  as 
the  outgroup  root  and  the  character  subsets  listed  above 
were  also  carried  out.  From  the  transversion  subset  two 
equally  parsimonious  trees,  I  and  IV,  are  found  at  60  steps 
each  with  a  consistency  index  of  0.78  (Fig.  5).  Rooting 
the  phylogeny  with  the  algal  sequence  therefore  cannot 
distinguish  between  tree  I,  supported  by  the  total  data, 
and  tree  IV  which  splits  the  angiosperms  and  is  highly 
incongruent  with  morphological  and  anatomical  data.  The 
transition  subset  does  infer  a  single  most  parsimonious 
solution  at  89  steps  with  a  consistency  index  of  0.78  (Fig. 
5) .  Homoplasy  and  tree  distributions  (Fig.  6)  are  similar 
to  those  observed  for  the  total  data  set. 
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Analysis  of  the  loop  position  subset  also  infers  two 
equally  parsimonious  trees,  I  and  II.  The  consistency 
index  value  for  these  trees  is  0.81  compared  to  0.74  for  a 
single  most  parsimonious  tree  (IV)  inferred  from  stem 
characters  (Fig.  5)  .  The  next  most  parsimonious  solution 
(tree  I)  requires  only  one  additional  step  for  the  stem 
character  subset.  Characters  from  regions  of  undetermined 
secondary  structure  infer  a  single  most  parsimonious  tree 
(I)  which  is  identical  in  topology  to  that  obtained  from 
the  total  data  set.  This  tree  requires  46  steps  and  has  a 
consistency  index  of  0.78. 

None  of  the  data  subsets  examined  appear  to  improve 
the  resolution  of  alternative  rooted  phylogenies  inferred 
from  the  total  data.  Transversion  and  loop  character 
subsets,  which  might  be  expected  to  contain  less  homoplasy, 
fail  to  produce  a  single  most  parsimonious  tree  from  their 
respective  analyses.  A  phylogeny  which  is  highly 
incongruent  with  phenotypic  data  (IV)  is  the  most 
parsimonious  found  using  stem  characters  and  is  among  the 
two  equally  most  parsimonious  found  using  transversion 
characters.  Results  of  this  analysis  suggest  that  the 
rates  of  substitution  for  all  types  of  characters  examined 
are  too  rapid  to  use  the  green  alga  sequence  as  an  outgroup 
root. 

For  the  ingroup  analysis  where  the  total  ss  rRNA  data 
set  provides  good  resolution,  some  differences  are  observed 
in  the  results  from  character  subset  analysis.    The 
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transversion  subset  appears  to  contain  less  homoplasy  than 
transition  characters.  Observed  frequencies  for  the  twelve 
possible  substitutions  were  estimated  by  correlating  the 
313  variable  positions  of  the  data  set  with  tree  I  (Fig. 
5)  .  The  frequencies  for  transitions  are  twice  that  of 
transversions  only  if  the  number  of  types  of  possibilities 
are  considered  for  transitions  (four  possible)  and 
transversions  (eight  possible)  (Table  4).  Overall, 
transitions  represent  57%  of  the  total  substitutions 
estimated  from  these  data.  Frequencies  of  substitution 
types  also  vary  approximately  twofold  among  different 
transversion  types.  A  twofold  difference  in  frequencies  is 
also  observed  for  transition  types  with  substitutions 
between  C  and  T  observed  at  twice  the  frequency  of  those 
between  A  and  G. 

The  differences  in  the  frequencies  of  various 
substitution  types  suggest  that  individual  types  of 
substitutions,  in  addition  to  the  overall  ratio  of 
transitions  to  transversions,  might  be  considered  when 
developing  character  weighting  schemes.  Williams  and  Fitch 
(1990)  have  developed  a  method  that,  in  addition  to 
weighting  of  character  positions,  allows  character  state 
transformation  weighting  within  the  character  position. 
Therefore,  this  method  allows  different  weights  to  be  given 
to  the  various  nucleotide  substitution  types. 
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Congruence   of   Phyloqenies   from   ss   rRNA  Sequences   and 
Phenotypic  Characters 


The  most  parsimonious  tree  and  the  next  two  shortest 
alternatives  found  in  this  analysis  (Fig.  5)  represent  the 
most  plausible  phylogenies  for  the  seed  plant  taxa 
considered.  Phylogenies  consistent  with  each  of  these  have 
been  proposed  based  on  phenotypic  evidence  (Beck  1966, 
1985,  Meyen  1984,  1986,  Rothwell  1985).  More  recent 
cladistic  studies  examining  seed  plant  phylogeny  have 
reviewed  the  evidence  supporting  these  alternatives.  These 
investigations  include  a  broad  range  of  morphological  and 
anatomical  characters  and  address  most  seed  plant  taxa 
including  fossil  groups  as  well  as  extant  gymnosperms 
(Crane  1985a,  1985b,  Doyle  and  Donoghue  1986,  1987). 
Phylogenies  consistent  with  each  of  the  three  shortest 
trees  found  in  this  analysis  are  found  among  equally  or 
nearly  as  parsimonious  alternatives  in  these  phenotypic 
studies. 

Phylogenies  consistent  with  a  clade  containing  Ginkgo 
and  cycads  (tree  I)  have  been  proposed  based  on  phenotypic 
data.  Extant  seed  plant  taxa  have  been  examined  using  a 
cladistic  approach  and  have  included  ferns  and  fern  allies 
as  outgroups  (Hill  and  Crane  1982).  Three  equally 
parsimonious  cladograms  are  presented,  all  of  which  link 
Ginkgo  with  the  cycads  but  vary  in  their  placement  of 
conifers  and  gnetopsids.  In  portions  of  another  study 
(Doyle  and  Donoghue  1987)  limited  to  extant  seed  plant 
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taxa,  a  clade  containing  Ginkgo,  cycads  and  conifers  is 
found  in  one  of  two  equally  parsimonious  alternatives.  The 
second  separates  Ginkgo  (plus  conifers)  from  the  cycads. 
Other  nearly  as  parsimonious  alternatives  found  also 
separate  Ginkgo  from  the  cycads  as  in  trees  II  and  III 
(Fig.  5)  found  in  this  analysis.  Furthermore,  in  analyses 
by  these  investigators  which  include  fossil  taxa, 
phylogenies  linking  Ginkgo  with  cycads  become  less 
parsimonious  than  those  separating  these  two  groups  (Doyle 
and  Donoghue  1986,  1987) .  Crane  (1985a,  1985b)  has  also 
addressed  extant  and  fossil  taxa.  A  consensus  cladogram, 
one  of  two  presented  therein,  supports  a  clade  containing 
Ginkgo  and  cycads  (plus  conifers) .  However,  the  second 
cladogram  retains  a  Ginkgo -conifer  clade  but  it  is 
separated  from  the  cycads.  The  latter  arrangement  is  also 
supported  in  an  analysis  of  extant  taxa  which  includes 
other  green  plant  groups  (Bremer  1985) . 

The  most  parsimonious  network  ,  A,  inferred  from  ss 
rRNA  sequences  is  strongly  supported  over  the  alternatives. 
This  network  is  highly  congruent  with  phenotypic  evidence 
for  these  taxa.  Alternative  networks  for  these  taxa  are 
implausible  given  the  morphological  and  anatomical  evidence 
condradicting  them.  Inferences  of  the  evolutionary 
relationships  between  Ginkgo  and  cycads  and  their 
relationships  to  the  angiosperms  require  rooting  this 
network  with  an  outgroup  sequence.   Placement  of  the  root 
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using  the  algal  sequence  is  relatively  weak  compared  with 
the  resolution  obtained  in  the  ingroup  analysis.  Use  of 
sequences  from  more  closely  related  outgroups  (eg.  ferns) 
will  improve  the  ability  to  root  networks  inferred  from  ss 
rRNA  sequences . 

Homoplasy  is  a  common  problem  in  plant  systematic 
studies  at  higher  taxonomic  levels.  This  problem  persists 
in  this  ss  rRNA  data  as  it  does  in  other  molecular  and 
phenotypic  data  sets  for  higher  plant  taxa.  These  problems 
with  the  ss  rRNA  data,  like  results  of  studies  based  on 
phenotypic  data,  suggest  that  trees  II  and  III  (Fig.  5) 
present  plausible  alternatives  to  the  most  parsimonious 
tree,  I,  found  in  this  analysis. 

Additional  Data  from  rRNA  Sequences 

The  characters  used  in  this  PAUP  analysis  included 
only  base  substitutions  observed  in  the  sequence  alignment. 
Differences  in  ribosomal  sequences  caused  by  insertional  or 
deletional  events  occur  at  a  frequency  at  least  an  order  of 
magnitude  lower  than  substitutional  events  (Zimmer  et  al. 
1989) .  These  were  not  incorporated  into  the  data  matrix 
because  of  uncertainty  on  how  to  treat  them  relative  to 
substitutions.  The  majority  of  the  insertion/deletion 
events  observed  are  apomorphic  change  in  individual 
sequences.  However,  two  synapomorphic  insertional  events 
are  inferred  from  our  alignment  (Fig.  2) .  One  supports  the 
node  for  the  two  grasses  corn,  and  rice.   The  other  one 
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supports  the  node  for  Ginkgo  and  the  cycad.  Given  the  low 
frequency  observed  for  insertions  it  would  appear  less 
likely  that  this  is  a  result  of  either  independent 
insertional  events  in  the  two  lineages  or  an 
insertion/deletion  event  at  the  same  position  to  correlate 
the  data  with  trees  II  or  III.  Therefore,  the 
monophyletic  grouping  of  Ginkgo  and  cycads  inferred  from 
nucleotide  substitutions  is  also  supported  by  this 
insertion. 

Other  studies  using  rRNA  sequences  also  appear  to 
provide  support  for  a  clade  containing  Ginkgo  and  the 
cycads.  The  first  is  a  study  using  5S  rRNA  sequences  from 
28  species  of  green  plants  (Hori  et  al.  1985)  which 
includes  Ginkgo .  a  cycad  (Cycas)  and  a  conifer 
(Metasequoia) .  Phylogenies  were  inferred  by  a  distance 
matrix  method  rather  than  cladistic  parsimony.  The 
phylogeny  presented  therein  does,  however,  suggest  a 
grouping  of  Ginkgo,  cycads  and  conifers. 

The  5S  sequences  for  a  limited  number  of  taxa  were  re- 
analyzed using  the  methods  described  here  for  ss  rRNA 
sequences.  Like  the  ss  rRNA  analysis,  the  5S  rRNA  analysis 
infers  a  single  most  parsimonious  network  that  is  superior 
to  its  competitors  under  a  parsimony  criterion.  This  re- 
analysis  of  5S  rRNA  sequences  also  parallels  the  ss  rRNA 
analysis  in  the  weakness  of  the  root  placement  using  the 
algal  sequence  from  Chlamydomonas . 
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Unlike  ss  rRNA  data  a  closer  outgroup  sequence  is 
available  for  the  5S  rRNA  data.  The  Chlamydomonas  sequence 
was  replaced  with  that  from  a  fern,  Drvopteris  acuminata. 
The  placement  of  the  root  using  this  sequence  infers  a 
single  most  parsimonious  tree  identical  in  topology  to  that 
found  in  the  ss  rRNA  analysis,  tree  I.  Tree  distributions 
found  using  the  fern  outgroup  (Fig.  8)  more  closely 
parallel  those  found  for  the  ss  rRNA  analysis.  Therefore 
for  this  analyses  of  limited  taxa,  the  most  parsimonious 
tree  inferred  from  5S  rRNA  data  using  the  fern  outgroup  is 
congruent  with  that  from  ss  rRNA  sequences  (Fig.  5) . 

Another  study,  a  cladistic  one  using  PAUP,  utilizes 
partial  sequences  from  both  small  and  large  subunit  rRNA 
and  includes  approximately  1700  nucleotide  positions 
(Zimmer  et  al.l989).  This  study  includes  Ginkgo,  cycad  and 
conifer  representatives.  A  tree  presented  therein 
representing  "major  features"  of  the  most  parsimonious 
found  also  appears  to  support  a  clade  containing  Ginkgo, 
cycads,  and  conifers. 

The  relationships  of  Ginkgo,  cycads  and  conifers 
inferred  from  phylogenetic  studies  have  generally  been 
supported  by  relatively  few  characters.  Alternative 
arrangements  of  these  taxa  are  found  in  equally  or  nearly 
as  parsimonious  trees.  There  is  considerable  variation  in 
the  number  of  taxa  as  well  as  the  number  and  type  of 
characters  used  in  these  studies.  Therefore,  when  studies 
support  different  most  parsimonious  phylogenies,  it  is 
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difficult  to  directly  compare  the  strength  of  support  for 
most  parsimonious  solutions  over  alternative  topologies. 
Recent  cladistic  studies  using  morphological  and  anatomical 
data  appear  to  favor  a  paraphyletic  arrangement  of  Ginkgo 
and  cycads  but  grouping  these  taxa  monophyletically  is 
equally  or  nearly  as  parsimonious.  In  contrast,  molecular 
studies  favor  a  monophyletic  grouping  of  these  taxa  over 
paraphyletic  arrangements,  but  the  latter  are  found  as  next 
most  parsimonious  alternatives.  Therefore,  both 
monophyletic  and  paraphyletic  relationships  of  these  taxa 
should  be  considered  plausible  alternative  phylogenies. 

The  complete  phylogeny  of  seed  plants  cannot  by 
addressed  with  molecular  studies  due  to  the  large  number  of 
fossil  taxa  that  are  involved  in  their  evolutionary 
history.  However,  further  examination  of  sequences  from 
extant  taxa,  including  conifers  and  the  gnetopsids  should 
aid  in  clarifying  relationships  of  these  groups  and  may 
provide  clues  as  to  the  polarity  of  phenotypic  characters. 
Sequences  from  more  closely  related  outgroups  (ie. 
pteridophytes)  will  aid  in  the  rooting  of  seed  plant 
networks . 


CHAPTER  5 
RIBOSOMAL  RNA  SEQUENCES  IN  THE  GINKGO  BILOBA  GENOME 


Experimental  Results 

A  group  of  clones  was  identified  from  the  Ginkgo 
genomic  library  in  lambda  that  contained  a  ca.  12  kb  Bam  HI 
fragment  which  hybridized  to  both  5'  and  3'  ss  ribosomal 
probes.  Restriction  analysis  of  these  clones  yielded 
fragment  sizes  inconsistent  with  those  expected  based  on 
conserved  sites  within  plant  ss  rRNA  coding  regions.  A 
restriction  map  (Fig.  9)  indicated  that  the  inferred  ss 
rRNA  coding  region  spanned  approximately  three  kilobases  of 
DNA,  longer  than  those  reported  for  eukaryotic  ss  rRNA 
sequences. 

A  3770  base  region  covering  the  inferred  ss  rRNA 
coding  region  of  one  of  these  clones,  Gbr-6700,  was 
sequenced.  The  sequenced  region  of  Gbr-6700  was  compared 
with  that  of  Gbr-1000  (Fig.  10)  .  The  Gbr-6700  sequence 
contains  a  region  that  is  evolutionarily  homologous  to  the 
ss  rRNA  coding  region  of  Gbr-1000  as  inferred  from  the 
sequence  similarities.  This  region  is  interrupted  by  a  1.1 
kb  insert  approximately  700  bases  from  the  5'  border  of  the 
ss  rRNA-like  coding  region.  The  -800  nucleotide  region  5' 
to  the  1.1  kb  insert  and  the  12  00  nucleotides  3'  to  the 
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insert  were  aligned  with  the  ss  rRNA  sequence  from  Gbr- 
1000.  For  the  pairwise  alignment  of  regions  homologous  to 
the  ss  rRNA  coding  region,  the  two  Ginkgo  sequences  share 
86%  sequence  similarity. 

The  number  of  copies  of  each  of  the  sequences,  Gbr- 
1000  and  Gbr-6700,  in  the  Ginkgo  genome  was  estimated.  A 
-1440  bp  Xba  I-Eco  RI  (Fig.  1)  from  pGbr-1000  and  a  -1440 
bp  Hind  Ill-Stu  I  fragment  from  pGbr-6000  (Fig.  9)  were 
isolated  and  used  as  probes.  Xba  I-Eco  RI  digests  of 
genomic  and  pGbr-1000  DNA  and  Hind  Ill-Stu  I  digests  of 
genomic  and  pGbr-6000  DNA  were  separated  by 
electrophoresis,  blotted  and  probed  using  the  respective 
isolated  DNAs.  Blot  autoradiograms  (Fig.  11)  were  scanned 
by  laser  densitometry  (Table  5)  .  The  number  of  copies  of 
each  sequence  contained  in  the  genomic  lanes  was  estimated 
by  interpolation  of  values  obtained  for  known  quantities  of 
plasmid  DNAs.  This  value  was  divided  by  the  number  of 
genomic  equivalents  in  the  genomic  digest  lanes.  Values 
obtained  indicate  that  the  diploid  Ginkgo  genome  contains 
an  estimated  16,000  copies  of  the  Gbr-1000  sequence  and 
3,400  copies  of  the  Gbr-6700  sequence. 

The  nucleotide  compositions  of  Gbr-1000  and  Gbr-6700 
were  determined  and  regions  homologous  to  the  ss  rRNA 
coding  region  compared  with  available  plant  sequences 
(Table  6).  The  Gbr-1000  sequence  contains  49.5%  A+T 
nucleotides  and  50.5%  G+C  nucleotides.  The  3770  base 
sequence  from  Gbr-6700  sequence  is  A-T  rich  containing 
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58.6%  of  these  nucleotides.  The  1814  nucleotides 
homologous  to  the  ss  rRNA  coding  region,  as  inferred  by 
sequence  similarity,  is  also  A-T  rich.  This  region 
contains  61.3%  A-T  nucleotides. 

The  minimum  number  of  observed  nucleotide 
substitutions  between  the  ss  rRNA  coding  region  of  Gbr-1000 
and  the  homologous  region  of  Gbr-6700  was  estimated  (Table 
7) .  Substitutions  replacing  G  and  C  nucleotides  with  A  and 
T  nucleotides  are  observed  at  a  much  higher  frequency  than 
other  substitutions  and  represent  91%  of  the  total 
substitutions.  Transitions  of  G>A  and  C>T  are  the  most 
common  representing  75%  of  the  total  observed 
substitutions . 

A  search  was  conducted  of  Genbank  data  files  using  the 
sequence  of  the  1100  nucleotide  insert  region  of  Gbr-6700. 
No  large  regions  (>20  bp)  sharing  significant  sequence 
similarity  were  found  using  an  80%  minimum  match  search 
parameter.  One  region  at  the  3'  end  of  the  insert  was 
found  to  share  >80%  sequence  similarity  with  15  to  20  base 
segments  from  a  wide  variety  of  sequences.  However, 
examination  of  alignments  generated  for  these  sequences 
reveals  that  the  high  degree  of  similarity  can  be 
attributed  to  homonucleotide  A  runs  contained  within  the 
sequences.  Therefore  the  origin  of  the  insert  in  Gbr-6700 
cannot  be  determined  at  present. 
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Figure  10.  The  sequence  from  Gbr-6700  (top  sequence)  is 
aligned  with  the  sequence  from  Gbr-1000 
(bottom  sequence) .  The  Gbr-6700  sequence  is 
numbered  approximately  every  100  bp  with  dots 
(.)  marking  every  10  bp  from  position  one. 
The  Gbr-1000  sequence  is  numbered  below  the 
alignment  with  +1  indicating  the  first 
nucleotide  position  of  the  ss  rRNA  coding 
region.  Positions  which  contain  identical 
nucleotides  in  both  sequences  are  indicated  by 
a  vertical  line  between  the  two  sequences. 
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100 
TCTAGAGTCGATCAAAGTTAGAACATAGGTGTAAAGGGTATCGTACGGGATACTCCTCCGAGGGTCCTGTGAACCAAAAGATGACATCGTCCTTCTCCAA 

200 
CTAGTTTTTGTGCATCGCCCCCCCTCCCTCAAATATGTCCCTCCACCATATCAGATAATCCCCTCCCCTCAGGGTTATGAATTTAAACTTGTTCCCTCTC 

300 
TCTTAGACCATTAGTCCACCTCCTCCTCAAGTTTGCTCAAGTCTTAAGGTTAATGAGGTATCAATCCCCTTGCTCTTTATTATTGAACATCGCCCATAGT 

400 
ATGTTCATTTTTTTTACCATCTCTACGAGCCCGCCTTCACCTATATCCACCCTCATCTTTCTTCACTGTCAGAATTTATGTTCCATCACCCGCAAGAATA 

500 
TATCCCACTGCGCTCCTTGAATATATTCATCCCTTCCTCTGCTTTATCGTTGAATCTCTCTAAGAGAGAGTTTGTCAAGTCAAATGAAAAGAGATCTCCT 

iiiiii  nil  III  iiiiiii  nil 

TTTGTCGAGTCGGATGCGAAGAGATGTCCT 
-98" 

CCC-ACTATTTTGATGGGTGTCTTGTCTTCTATT--ATAGCGAGTGGGGTGCCCAACGTTCAAGCATGCTACCTAGTTGATCATGCTAGTAGTCATATGC 

III  II  I  II  II  III  I  II  nil  II  I  I  IIIIII   III  nil  iniiii  i  iiiiinn  iiiiiii  in  innnnnn 

CCCAACCAGTTCGACGGGCGCCTCGTCTCCTGCTGCACAGCGAG-CGGGCGCCCGACGTTCAGGGATGCTACCTGGTTGATCCTGCCAGTAGTCATATGC 

-50"  +r 

697 
TTGCCTCAAAGAGTAAGCCATGCATGTGTAAGTATGAACTCTTTCAAACTATGAAACTGTGAATGGATCATTAAATCAGTTATAGTCTATTTGATATTAC 

III  iiiiiin  iiiiiniiiiiiiiiiiiiiiiiiiiiiiiii  III  iiiiiiii  IIIIII  iiiiiiiiiiiiiniiii  i  iiiiii  in 

TTGTCTCAAAGATTAAGCCATGCATGTGTAAGTATGAACTCTTTCAGACTGTGAAACTGCGAATGGCTCATTAAATCAGTTATAGTTTCTTTGATGGTAC 

100" 

797 
CTTACTACTCAGATAAGCGTAGTAATTCTAGAGATAATATGTGCACCAAATCCTAACTTCTAGAAGGGAAACATTTATTAGATAAAAGGTTGACGCGGGC 

iiiiiiiiii  inn  innnnnnni  iiiii  nnininni  iiiiii  niiiii  innnniiniini  iniiiin 

CTTACTACTCGGATAACCGTAGTAATTCTAGAGCTAATACGTGCACCAAATCCCGACTTCTGGAAGGGACGCATTTATTAGATAAAAGGCCGACGCGGGC 

200" 

TCGCTTGCTACTTTAGTGATTCATGATAATTCAACATATACCATTGCCCTGGTGCTAGCGATGCTTCATTCAAATTTCTATCCTATCAACTTTTAATGTT 

III!  Ill  III  iiiiiiiiiiiiii  II  11  II  II  IIIIIIIIII  nil  iiniiiiiiiiiiiii  iiiiiiiinii  in  i 

TCGCCCGCTGCTTCGGTGATTCATGATAACTCGACGGATCGCACGGCCCTGGTGCCGGCGACGCTTCATTCAAATTTCTGCCCTATCAACTTTCGATGGT 

300" 

997 

AGGATAGAGGCCTACCATGGTGGTGATGAGTGATAGATAATTAGGGTTCAATTTTAGAGAGGGAGCTTGAGAAACAACTACCATATCTAAGGAAGGAAAC 

llllllllllllllllllllllllll  I  nil   II  lllllllllll  III    llllllllli  IIIIIIII   IIIIII  III  IIIIIIII  I  I 

AGGATAGAGGCCTACCATGGTGGTGACGGGTGACGGAGAATTAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGC 

400" 

.   1097 
AGGCGCACAAATTATCCAATCTTGACATGGGGAGGTAATGATAATAAATAACAATATTGGGCTCATCGATTCTGGTAATTGGAATGAGTATAATCTAAAT 

IIIIII  IIIIIII  IIIIII  IIIII  lllinill  III  IIIIIIIIIIIIII  llllllllllll  lllllllllllllillllll  lllllllll 

AGGCGCGCAAATTACCCAATCCTGACACGGGGAGGTAGTGACAATAAATAACAATACTGGGCTCATCGAGTCTGGTAATTGGAATGAGTACAATCTAAAT 

500" 
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■■•••••■■■      llTf 

CCCTTAATGAGAATTCATTGGAGGGAAAGTCTGGTGCCAACAACCACAGTAATTCTATCTCTAATAGTGTATGATTAAGTTTTTGCAGTTAAAAATATCA 

lllllil  III  II  llllllllll  lllllllllllll  II  II  I  lllllll  I  III  mil  nil   lllllll  lllllllllllll   II 

CCCTTAACGAGGATCCATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCG 

600" 

I 
.   1297  j 

TAGTTGGATATTGGGTCGGGTTGGTTGGTCTACCTTTTTGGTTTGCACCGATCAGTCTATCCCTTCTTCCTTATTGCATTGTTCAATGGAGGGTAGAGAG  ! 

iiiiiiiii  mil  mil  II  nil  iiiiii  in  iiiiiii   i  ii  iiiiini  ii 

TAGTTGGATCTTGGGCCGGGTCGGCCGGTCCGCCTTTTCGGTGTGCACCGGCCGCTCCGTCCCTTCTGCC 

700'  I 

.   1397 
CCAGAGGATTGTTTAAGTATAGAGGCTTGAAGACTTAGATTGTATACCTAGTCAACCTCCCTAGTGAGTATAGAGTTGGGGAGAAGTGGAGTTGGGTCCA 

i 

U97  I 

TCGCACAAAATCATGGGTTGCGAAGCATGCTTCGCAATGTCAGAATCAGGGGTTCTTGGTGCAAGAGGTAGCTTGGGTTGCATGGGAGTCAAAGCACTCT  I 

.  1597       ; 

ATTGAGGCTTTGCAATTTTACAAAGATTTTAATTTAGTTGATTCAAGGGATTTTTTTAAGAGAGCAATGAGGTATGAGGATGTGTTCCCTGATAAAGTTT  ] 

.   1697  j 

TCTTACTGCTCTTGGCTAGGGTTGGGGGGTGTCCATTAGTTAAGAGAGTTGTTAGGTTGGGTGTAGTGGATCATAGGTTAGATTTAGGTGTTCGCTCTGT  | 

I 

.   1797 
AATGAAGAAGGAGGTGATTCTCTTTTTCATGTCTTGATTAAATTCCCAAGCTTGGTTCATATTTGGAGTAGGGTTGAAGGTCAAGAGAGTCTTGTGTCAA 

1 

.   1897  I 

ACCTTTTACTGACCAGTGATGTTATCGATGATAGGTTTTTACTCGCATTACTCCTTGGTGGGGAGATAGGCAAGAGGCTCCGTAATTGGTTATGTGGTCG 

.   1997  j 

ATCGTAGTGGAGGGGCACCATAGGCTCTGTACATGTAGTGAACCTCCTTGCTGAGGTCATCCCACTATATAACTAAGCCCTAAGGGATTGGGATGGGTCT 

.   2097 
CAACCCATCCTAGTTGAGGGGCCTTTAAGAAGTGATACCACCATTAAACCAATGCCTATAAGGGTAGGGATGGCCTTTCATCAAGGCACATAGTGGTCAA 

.   2197  ; 

GGTAATGAGTGTTTACCTAGGTTAGAGTAACACCTAGGTTGGTTGCTAATAAGTAGGGTGTGACACTAGGTGAAGCACACTAGATTTGTTGGCGATCAAC  | 

i 

.   2297  I 

CTAAGTGGGTATACTTGAACCTTTGGAGAATTCCCCAAGTCATTGGGACTGAGAGAAAAAGTTCTCAAAGAGTGAGTATCCTCCAAGCCATGATATAACA  j 

I 
.   2397  I 

TGGGAGCCTAATAAGATCGAGGTGTGCTTTTAGCCTTGGCTTTGCCCATCTCAAATGAAAAAAATAAAAAATCTGTTAGTGACTCACTCCTGGCCTTAAT  i 

I  I  II  llllllllllllll 
GGCGGCGCGCTCCTGGCCTTAAT 
702' 

.   2497  1 

TGGGTAGGTCGCGACTCTAGCATCGTTACTTTAAAAAAAATTGAGTGCTCAAAGCAAGCCTATGCTCTGAATACATTAGCATGGAATAACATGATAAGAG 

III  I  lllllll  III  II  IIIIIIIII  IIIIII  I  miiiiiiiiiiiiiiiii  innnnnnmimiiiiiii  mi  in 

TGGCTGGGTCGCGGCTCCGGCGCCGTTACTTTGAAAAAATTAGAGTGCTCAAAGCAAGCCTACGCTCTGAATACATTAGCATGGAATAACGCGATAGGAG 

800' 


Figure  10 — continued 
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2597 
.■■■•■>■■■        ^j ' I 

TCTGGTCCTATTGTGTTGGCCTTTAGGACAGAAGTAATGATTAATAGAGATGGTTGAGGGAATTCATATTTCATTGTTAGAGGTGAAATTCTTGGATTTA 

lllllllllllllllllllllll      INI    I    lllllllllllllll    11    III    I    III    MM    lllllllllll    llllllllllllllllllllll 
TCTGGTCCTATTGTGTTGGCCTTCGGGACCGGAGTAATGATTAATAGGGACGGTCGGGGGCATTCGTATTTCATTGTCAGAGGTGAAATTCTTGGATTTA 

900" 

.   2697 
TGAAAGATGAACCACTGCAAAAGAATTTTCCAAGGATGTTTTCATTAATCAAGAATGAAAGTTGGGGACTCAAATATGATTAGATACCATCCTAGTCTCA 

lllllll  llllllllll  MM  MM  llllllllllllllllllllllllll  lllllllllll  Mi  II  I  III  lllllll  lllllllllll 

TGAAAGACGAACCACTGCGAAAGCATTTGCCAAGGATGTTTTCATTAATCAAGAACGAAAGTTGGGGGCTCGAAGACGATCAGATACCGTCCTAGTCTCA 

1000* 

.  2796 
ACCATAAATGATGCCGACTAGGGATCGACGGAGGTTGCTTTAAGGACTCCGTCAGCACCTTGTGAGAAATCAAAGTTTTTGGGTTCC-AGGGGAGTATGG 

llllllll  llllllllllllllllll  MM  llllllllllllllllll  I  lllllilllllllllllllllllllllllllll   lllllllllll 
ACCATAAACGATGCCGACTAGGGATCGGCGGATGTTGCTTTAAGGACTCCGCCGGCACCTTGTGAGAAATCAAAGTTTTTGGGTTCCGGGGGGAGTATGG 

1100" 

••..■■•■■•  2895  1 

TC-CAAGGATAAAACTTAAAGGAATTGATGGAAGGGAATCGCCAAGAGTGGAGCCTACGACTTAATTTAACCCAACATGGAGAAACTTACCAGGTCCAGA 

II  Mill  I  MMIMMMMIMI  Mlllll  I  I  III  lllllllllll  II  Mllllll  M  lllll  II  III  M  llllllllllllll  \ 

TCGCAAGGCTGAAACTTAAAGGAATTGACGGAAGGGCACCACCAGGAGTGGAGCCTGCGGCTTAATTTGACTCAACACGGGGAAACTTACCAGGTCCAGA  ' 

1200"  \ 

.   2995  j 

CATAGTAAGGATTGACAAATTGAGACCTCTTTCATGATTCTATGGGTGGTGGTGCATGGTCGTTCTTAGTTGGTGGAGCGAGTTATCTGGTTAATTTTTT 

IMMMIMMMIM  lllllll  lllllll  II  llllllllllllll  lllllllll  MMMMMMMIMMM  II  IMIMMIM      I  j 

CATAGTAAGGATTGACAGATTGAGAGCTCTTTCTTGATTCTATGGGTGGTGGTGCATGGCCGTTCTTAGTTGGTGGAGCGATTTGTCTGGTTAATTCCGT  ! 

1300"  ; 

.  3095  ; 

TAAAGAATGAGACCTCAACCTACTAACTAGCTATACGTAGGTTTGCCTTTGTGGCCAACTTCTTAGAGGGACTATGTCCCTTCAAGCCATGGAAGTTTAA 

III  III  lllllllll  III  llllllllllll  II  lllll  lllll  lllllll  lllllllllillllllll  lllllll  lllllllllllll  I 
TAACGAACGAGACCTCAGCCTGCTAACTAGCTATGCGGAGGTTCGCCTTCGTGGCCAGCTTCTTAGAGGGACTATGGCCCTTCAGGCCATGGAAGTTTGA 

UOO"  ' 

■  ■  •  •  a  ■  •  ■  ■  sjl  YJ  f 

GGCAATAATAGGTTTGTGATACCCTTAGATATTATGGGTTGTACGTGCACTGCATTGTTGTATTCAACAAGTCTATAACCTGGGCAGAGAGGCCTAGGTA  ' 

IIIIIIM  MM  llllll  lllllllll  II  MM   I  III  II  II  II  II  llllllllll  llllllllllllllll  llllllll   II  I 

GGCAATAACAGGTCTGTGATGCCCTTAGATGTTCTGGGCCGCACGCGCGCTACACTGATGTATTCAACGAGTCTATAACCTGGGCCGAGAGGCCCGGGAA 

1500" 

.  3295  ' 

ATCTATCAAAATTTCATCATGATGGGGATAGATCATTATAATTATTGATCTTAAACAAGGAATTCCTAGTAAGCGCTAGTCATCAAATCACATTGACTAT  ! 

Mil    I  llllllllll  llllllllllllllllll    lllllllllllllllll  MIMIIIMIMMIIM  lllllllll  II  I  lllilil  J 

ATCTGCCGAAATTTCATCGTGATGGGGATAGATCATTGCAATTATTGATCTTAAACGAGGAATTCCTAGTAAGCGCGAGTCATCAACTCGCGTTGACTAC  I 

1600"  ! 

.  3395 

GTCCATGCCCTTTGTACACAATTCCCATCCCTCCTACCAATTGAATGATTCGGTGAAGTGTTTGGATCGTGTCAACAAT6ATAGTTCACCACTGGCGACA  | 

Mil  lllllllllllllll    III  II  llllllll  llllllllll  llllllllllll  llllll  I  I  II  I  I    MM  II  I  llllll  I 

GTCXCTGCCCTTTGTACACACCGCCCGTCGCTCCTACCGATTGAATGATCCGGTGAAGTGTTCGGATCGCGCCGACGACGGCGGTTCGCCGCCGGCGACG  i 

1700" 

•         •         ■••■•••>  j*tyH 
TTGCGAGAAGTTTATT-AATGTTATCATTTAGAGGAAGGAGAAGTCATAGCAAGGTTTTCGTAGGTGAACCTGCAGAAGGATCATTGATCATCCTGATGT 

I  llllilllll  III  II   lllllllllllllllllllllllll  II  llllllll  lllllllllllllll  llllllllllllll  MM  I   II 

TCGCGAGAAGTTCATTGAACCTTATCATTTAGAGGAAGGAGAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGATCCCGTCGT 

1811" 


Figure  10 — continued 
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3593 
TTGTAAGAAGAATCAGGATACAACGACTTCAAAAGGGGCAACCCACACATTGTCTGGCCGT-ATTACCCTTGACTGCGAATGGCTACATGATGGATATTG 

III  III  I  I  II       III  I     III  I      III    nil  I    II  I    III 

CCGTA--AAGGAACGGGGCGTGGCGAACT-GTGAGGATCGTGGGATGGACCCCCGTCCCGTCGTCGGCCGTCGTTGCCCTCC 

1900* 

GTAGTGCAATTGTCTGACTGGTAAGGACGCACCCTTCATCCTATTATCTAGGGATATCGGGGTGACCCACCTTCCATCTGACCGTGTTACCCTCTGACCA 

3770 
CGAATAGCTATGTGATGGTGTTGCCAGCATGTTCATTATGTGGGTAAGGACATACCCTTAGTCCACCCTCCAAGTGC 


Figure  10 — continued 
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Table  5.  Genomic  Copy  Numbers  for  Ginkgo  ss  rRNA  Sequences 


Optical    Nirber  of    Genomic    Copies 
Density    Molecules   Equivalents  per  2C 


Gel  Lane  DNA 

Quantity 

Optical 

Type 

of  DNA 
Lane    14A0  bp  Band 

Density 

A  1   Genomi  c 

2.0  ug 

3,803,790 

100,000 

A  2   Genomic    1.0  ug       ---         2,289,130    8.0  x  10^     50,000    16,000 
A  3  pGbr-1000   0.01  ng     0.002  ng         —      1.1  x  10^ 
A  4  pGbr-1000   0.1  ng     0.017  ng         ---      1.1x10^ 
A  5  pGbr-1000   1.0  ng     0.166  ng        216,520    1.1x10® 
A  6  pGbr-1000  10.0  ng     1.660  ng      2,867,820    1.1  x  1o' 


B  1  Genomic  2.0  ug  ---  3,559,600      ---       100,000 

B  2  Genomic  1.0  ug  ---  1,651,590  1.7x10®     50,000    3,400 

B  3  pGbr-6000  0.01  ng  0.002  ng  —  1.5  x  10^ 

8  4  pGbr-6000  0.1  ng  0.024  ng  ---  1.5  x  10^ 

B  5  pGbr-6000  1.0  ng  0.236  ng  1,426,800  1.5  x  10® 

B  6  pGbr-6000  10.0  ng  2.36  ng  11,955,760  1.5  x  1o' 


Optical  densities  were  computed  for  the  areas  of 
labelled  autoradiogram  bands.  The  number  of  genomic 
equivalents  in  the  genomic  digest  lanes  was  calculated 
using  a  value  of  20  pg  per  diploid  cell  (Ohri  and  Khoshoo 
1986)  .  The  number  of  sequence  copies  was  estimated  by 
interpolation  from  values  obtained  for  known  quantities  of 
plasmid  DNA. 
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Table  6.  Nucleotide  Compositions  for  Plant  ss  rRNA  Genes 


NUCLEOTIDES     G.b.        G.b.        Z.p.        G .  ID .        O.s.        Z.m. 
6700        1000       


NUMBER         1814        1811        1813        1807        1812        1810 


553        443         444         451         444         447 
(30.5)     (24.5)     (24.5)     (25.0)      (24.5)     (24.7) 


305         411         394         397        421         419 
(16.8)     (22.7)     (21.7)     (22.0)     (23.2)     (23.1) 


G  397        503        506        490         509        504 

(21.9)     (27.8)     (27.9)     (27.1)     (28.1)     (27.8) 


T  559        454         469         469        438        440 

(30.8)     (25.1)     (25.9)     (25.9)      (24.2)     (24.3) 


A  +  T  1112        897        913        920         882        887 

(61.3)     (49.5)     (50.4)     (50.9)     (48.7)     (49.0) 


G  +  C  702        914        900         887        930        923 

(38.7)     (50.5)     (49.6)     (49.1)     (51.3)     (51.0) 


The  total  number  of  each  nucleotide  is  indicated  with 
the  percentage  of  the  total  nucleotides  in  the  secjuence 
listed  in  parenthesis  below.  Taxa  included  are:  Ginkgo 
biloba  (G.b.)/  Zamia  pumila  (Z.p.),  Glycine  max  (G.m.), 
Oryza  sativa  (O.s.),  and  Zea  mays  (Z.m.). 
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Table  7.  Substitution  Frequencies  for  Ginkgo  Sequences 


Subst. 

No.  of 

%  of 

Type 

Subst . 

Subst 

A  >  C 

0 

0 

A  >  T 

3 

1.2 

C  >  G 

4 

1.6 

G  >  T 

20 

8.2 

A  >  G 

4 

1.6 

C  >  T 

87 

35.5 

Subst. 
Type 


C  >  A 

T  >  A 

G  >  C 

T  >  G 

G  >  A 

T  >  C 


No. 

of 

%  of 

Subst. 

Subst 

20 

8.2 

3 

1.2 

2 

0.8 

3 

1.2 

96 

39.2 

3 

1.2 

The  minimum  number  of  nucleotide  substitutions  between 
the  two  Ginkqo  ss  rRNA  like  sequences  was  estimated.  The 
nucleotides  on  the  left  indicate  those  in  the  Gbr-1000 
sequence  and  those  on  the  right  of  the  arrow  indicate  the 
nucleotide  in  the  Gbr-6700  sequence. 
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Discussion 

A  ss  rRNA  Pseudoqene  from  Ginkgo  biloba 

While  screening  clones  from  the  Ginkgo  genomic  library 
a  group  of  clones  was  purified  which  contained  a  ca.  12  kb 
Bam  HI  fragment  that  hybridized  to  both  5'  and  3'  ribosomal 
probes.  Restriction  analysis  indicated  that  the  ss  rRNA- 
like  coding  sequence  spanned  a  region  of  DNA  approximately 
one  kilobase  longer  than  other  eukaryotic  ss  rRNA 
sequences.  A  representative  from  this  group,  lambda  Gbr- 
06,  was  selected  for  further  study  (Fig.  9).  The  3770  base 
sequenced  region  of  Gbr-6700  contains  a  region  homologous 
to  the  Ginkgo  ss  rRNA  coding  region  from  Gbr-1000  as 
inferred  from  the  high  sequence  similarity  shared  for  a  1.8 
kb  region  of  the  two  sequences.  This  ss  rRNA-like  coding 
sequence  is  interrupted  by  a  1.1  kb  insert  700  bases  from 
the  5'  border. 

Comparison  of  Ginkgo  ss  rRNA-Like  Sequences 

A  pairwise  alignment  was  constructed  for  the  sequences 
of  Gbr-6700  and  Gbr-1000  (Fig.  10)  .  The  ss  rRNA-like 
coding  regions  share  86%  sequence  similarity.  This  is 
lower  than  that  observed  between  ss  rRNA  sequences  from 
different  seed  plant  taxa  which  range  from  92  %  to  97%. 
The  alignment  includes  approximately  100  nucleotides  each 
for  the  5'  and  3'  flanking  regions  available  for  both 
sequences.    The  sequence  similarity  for  the  5'     flanking 
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region  is  74%  from  which  one  can  infer  homology  of  these 
regions.  The  3'  flanking  regions  for  these  sequences  do 
not  share  significant  similarity  which  is  less  than  45%  for 
this  alignment. 

The  ss  rRNA-like  sequence  of  Gbr-6700  is  A-T  rich 
containing  61.3%  of  these  nucleotide  residues  (Table  6). 
In  contrast  the  coding  region  for  the  Ginkgo  ss  rRNA 
sequence  from  Gbr-1000  is  only  49.6%  A  and  T  residues. 
Other  plant  ss  rRNA  sequences  range  from  48.7%  to  50.9%  A 
and  T  residues  but  all  of  these  vary  less  than  1.5%  from 
50:50  ratios  of  A-T  to  G-C  nucleotides. 

The  data  obtained  from  these  experiments  suggest  that 
the  Gbr-6700  sequence  represents  a  ss  rRNA  pseudogene  that 
is  present  in  multiple  copies  in  the  Ginkgo  genome.  Large 
insertions  have  not  been  reported  to  date  for  plant  ss  rRNA 
sequences.  The  rRNA  tandem  repeat  is  transcribed  as  a 
single  unit  and  subsequently  processed  in  the  nucleolus  to 
yield  mature  ss  rRNA,  5.8S  rRNA  and  Is  rRNA.  However,  a 
splicing  mechanism  would  be  required  to  yield  a  mature  ss 
rRNA  from  Gbr-6700  and  no  such  splicing  mechanisms  for  rRNA 
genes  are  known  to  occur  in  plant  rRNA  systems. 

The  homologous  regions  of  the  two  Ginkgo  ss  rRNA-like 
sequences  are  more  highly  diverged  than  are  sequences  from 
distantly  related  seed  plant  taxa.  This  observation  is 
consistent  with  the  hypothesis  that  evolution  of  the  Gbr- 
6700  sequence  is  not  limited  by  functional  and  structural 
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constraints  imposed  on  expressed  plant  ss  rRNA  genes. 
Observed  frequencies  of  substitution  types  are  also  very 
different  than  those  estimated  for  other  plant  ss  rRNA 
sequences  (Table  7)  .  There  is  an  overall  pattern  of 
increase  in  A+T  content  of  the  Gbr-67  00  sequence  consistent 
with  substitutional  frequencies  observed  for  the  sequence. 
Treating  Gbr-1000  as  the  primitive  sequence,  C>T  and  G>A 
transitions  represent  75%,  36%  and  39%  respectively,  of 
total  substitutions  whereas  T>C  and  A>G  transitions  make  up 
only  3%  of  the  observed  sustitutions.  The  latter  two  types 
of  transitions  are  less  frequent  than  C>A  and  G>T 
transversions  which  represent  16%  of  observed  sustitutions 
and  also  increase  the  A-T  content  of  the  sequence. 
Substitutions  replacing  G  or  C  with  A  or  T  account  for  91% 
of  the  substitutions  observed  between  the  Gbr-1000  and  Gbr- 
6700  sequences.  Transversions  in  the  opposite  direction, 
A>C  and  T>G,  occur  at  much  lower  frequencies  which  are 
similar  to  those  for  T>C  and  A>G  transitions  as  well  as  the 
remaining  types  of  transversion  substitutions.  The  overall 
A+T  content  of  the  Gbr-6700  ss  rRNA-like  sequence  is 
significantly  higher  at  61%  than  any  other  plant  ss  rRNA 
sequence  reported  which  all  vary  less  than  1.5%  from  50:50 
ratios.  The  sequence  and  nucleotide  composition  data  for 
Gbr-6700  are  consistent  with  the  hypothesis  that  this 
sequence  does  not  produce  a  functional  ss  rRNA  molecule. 
The  sequence  does  not  appear  to  be  conserved  enough  to  .PA 
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conform  to  the  secondary  structure  models  proposed  for 
plant  ss  rRNA  molecules. 

Genomic  copy  number  experiments  were  used  to  estimate 
the  number  of  copies  of  each  sequence  present  in  the  Ginkgo 
genome  (Fig.  11)  .  Interpolation  from  values  from  known 
quantities  of  plasmid  indicate  that  approximately  16,000 
copies  of  Gbr-1000  are  present  in  the  diploid  Ginkgo 
genome.  This  value  is  within  the  range  reported  for  ss 
rRNA  genes  in  the  genomes  of  other  seed  plant  taxa.  These 
data  also  indicate  that  the  Gbr-6700  sequence  is  present  in 
about  3,400  copies  per  diploid  genome  and  does  not 
represent  a  single  copy  variant  of  the  Ginkgo  ss  rRNA  gene. 

Homogeneity  of  sequences  has  been  observed  for  members 
of  the  rRNA  gene  families  of  many  eukaryotic  taxa  (Rogers 
and  Bendich  1987,  Jorgensen  and  Cluster  1988).  Fluctuation 
in  rRNA  gene  copy  number  and  homogenization  of  these  copies 
are  proposed  to  occur  via  unequal  crossover  and  gene 
conversion.  Variants  can  then  be  fixed  in  the  population 
by  mechanisms  of  molecular  drive  (Dover  1982)  . 

Using  the  data  obtained  in  these  experiments  the 
following  hypothesis  is  proposed.  The  Gbr-6700  sequence 
arose  by  a  transposition  event  in  which  a  1.1  kb  piece  of 
DNA  was  inserted  into  a  single  copy  of  a  ss  rRNA  coding 
region.  The  variant  was  then  propagated  by  mechanisms  of 
unequal  crossing  over  and/or  other  mechanisms  which  produce 
copy  number  variation  in  rRNA  gene  families. 
Homogenization  of  the  variant  copies  is  occurring  via  the 
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same  mechanisms  of  gene  conversion  which  cause  concerted 
evolution  of  rRNA  gene  families.  At  present  there  are  no 
chromosomal  maps  available  for  Ginkgo  biloba.  Therefore, 
it  cannot  be  determined  at  this  time  whether  these  two 
sequence  families  are  located  in  a  single  nucleolar 
organizing  region  or  on  non-homologous  chromosomes.  Future 
experiments  will  require  mapping  of  these  two  variant  gene 
families  and  may  help  provide  insight  into  the  mechanisms 
involved  in  copy  number  fluctuation  and  concerted  evolution 
of  rRNA  gene  families. 


CHAPTER  6 
SUMMARY  AND  CONCLUSIONS 


Phyloqenetic  Analysis 

The  study  presented  herein  indicates  that  Gbr-1000 
represents  a  single  copy  of  the  ss  rRNA  gene  from  the  major 
ribosomal  repeat  in  Ginkgo  biloba.  Comparative  analysis  of 
Gbr-1000  with  other  plant  ss  rRNA  sequences  show  that  these 
sequences  share  high  sequence  similarities,  over  90%  for 
seed  plant  taxa.  Phyloqenetic  analysis  of  these  sequences 
produces  a  clearly  most  parsimonious  network  for  seed 
plants  which  is  highly  congruent  with  relationships 
inferred  from  morphological  and  anatomical  data.  Rooting 
the  network  using  the  algal  sequence  as  the  outgroup  also 
infers  a  single  most  parsimonious  tree.  However,  the 
placement  of  the  root  using  the  Chlamydomonas  sequence  is 
weak  compared  with  resolution  obtained  in  the  ingroup 
analysis.  The  most  parsimonious  tree  (I  at  149  steps) 
which  places  Ginkgo  and  Zamia  in  a  monophyletic  group 
(clade)  is  congruent  with  most  parsimonious  and  nearly  as 
parsimonious  phylogenies  inferred  from  some  studies  using 
phenotypic  characters.  Next  most  parsimonious  trees  (II 
and  III)  which  required  an  additional  five  and  eight  steps, 
respectively,  are  also  congruent  with  results  of  some 


105 


106 

studies  using  phenotypic  characters.  Therefore,  given  the 
limited  number  of  taxa  examined  here,  the  relatively  weak 
placement  of  the  root  using  Chlamydomonas  and  evidence  from 
morphological  and  anatomical  studies  the  three  most 
parsimonious  trees  inferred  from  this  data  (I,  II  and  III) 
should  each  be  considered  plausible  alternatives  for  the 
true  phylogeny  of  these  taxa. 

The  resolution  obtained  in  the  ingroup  analysis  is 
consistent  with  the  hypothesis  that  ss  rRNA  sequences 
evolve  at  a  rate  which  retains  informative  characters  for 
studies  at  taxonomic  levels  covered  by  the  major  seed  plant 
groups.  The  sequence  similarities  between  the 
Chlamydomonas  sequence  and  seed  plant  sequences  are  much 
lower  than  observed  between  seed  plant  taxa.  This  combined 
with  the  weak  placement  of  the  Chlamydomonas  root  in  this 
analysis  suggest  that  rates  of  evolution  in  the  ss  rRNA 
genes  are  too  rapid  for  these  sequences  to  be  useful  for 
examining  such  distantly  related  taxa  as  the  green  algae 
and  higher  plants.  Therefore,  if  relationships  among  major 
seed  plant  lineages  are  to  be  more  clearly  resolved, 
sequences  from  closer  outgroup  taxa  (eg.  ferns)  are  needed 
to  root  plant  phylogenies  with  more  confidence. 

The  analyses  using  subsets  of  characters  based  on 
classes  of  substitutions  and  secondary  structure  indicate 
that  none  of  these  types  of  characters  improve  resolution 
obtained  for  rooted  phylogenies.  Transversion  characters 
and  characters  from  unpaired  loop  regions  of  the  ss  rRNA 
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molecule,  which  might  be  expected  to  provide  better 
resolution,  fail  to  produce  a  single  most  parsimonious 
solution  in  analyses  of  rooted  phylogenies.  These  results 
also  support  the  idea  that  more  closely  related  outgroup 
sequences  are  needed  to  root  seed  plant  phylogenies 
inferred  from  ss  rRNA  data. 

For  ingroup  analyses  transversion  characters  do 
contain  less  homoplasy  than  transitions.  However, 
transition  characters  also  produce  a  single  most 
parsimonious  network  identical  in  topology  to  that  inferred 
from  transversions  and  the  total  data  set.  This  indicates 
that  transition  substitutions  also  provide  informative 
characters  for  phylogenetic  studies  of  plant  relationships 
at  these  taxonomic  levels.  Furthermore,  observed 
frequencies  for  the  different  types  of  transversion 
substitutions  as  well  as  for  the  different  transition 
substitutions  suggest  that  individual  rates  for  each  of  the 
twelve  substitution  types  could  be  used  as  a  basis  for 
character  weighting. 

Differences  in  resolution  obtained  for  unrooted 
networks  using  character  subsets  based  on  secondary 
structure  positions  are  less  clear.  Homoplasy  and  network 
score  distributions  are  similar  for  loop  and  stem  character 
subsets.  Approximately  one  third  of  the  total  characters 
are  from  a  region  for  which  no  plant  secondary  structure 
model  is  available.    Further  refinement  of  a  general 
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secondary  structure  model  for  plant  ss  rRNAs  will  allow  a 
more  thorough  examination  of  characters  from  unpaired,  loop 
and  paired,  stem  region  characters. 

Small  Subunit  rRNA  Sequences  of  Ginkgo  biloba 

The  ss  rRNA  gene  within  the  clone  pGbr-1000  is  a 
representative  of  the  ss  rRNA  genes  contained  in  the  major 
ribosomal  repeat  of  Ginkgo  biloba.  The  sequence  contained 
in  the  coding  region  shares  92-96%  sequence  similarity  with 
other  seed  plant  ss  rRNA  genes.  The  sequence  retains 
regions  of  highly  conserved  sequence  with  potential  ss  rRNA 
secondary  structure  characteristics  proposed  for  plant  ss 
rRNA.  The  nucleotide  composition  is  similar  to  that  of 
other  seed  plant  ss  rRNAs  all  of  which  are  within  1.5%  of 
having  50:50  A+T  to  G+C  ratios.  The  Gbr-1000  sequence  is 
present  in  an  estimated  16,000  copies  per  diploid  genome  in 
Ginkgo.  This  falls  within  the  range  reported  for  other 
seed  plant  taxa. 

A  second  ss  rRNA-like  sequence,  Gbr-6700,  was  also 
found  which  is  present  in  about  3,400  copies  per  diploid 
genome.  The  sequence  contains  a  ss  rRNA-like  coding  region 
which  is  interrupted  700  nucleotides  from  the  5'  end  by  a 
1.1  kb  insert  sequence.  The  sequence  is  A-T  rich  overall 
and  the  region  homologous  to  the  ss  rRNA  coding  region 
contains  61%  A+T  nucleotides.  This  is  approximately  10% 
higher  than  found  for  other  plant  ss  rRNA  sequences. 
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The  sequence  similarity  from  homologous  regions  of 
Gbr-1000  and  Gbr-6700  is  86%,  lower  than  that  observed 
between  ss  rRNA  sequences  from  different  plant  taxa.  This 
combined  with  other  data  obtained  in  this  study  suggest 
that  the  Gbr-6700  sequence  represents  a  ss  rRNA  pseudogene. 

The  Gbr-6700  sequence  appears  to  have  undergone  one  or 
more  insertional  events  and  subsequently  been  propagated  in 
the  genome.  Mechanisms  which  operate  on  rRNA  gene  families 
which  cause  changes  in  copy  number  and  concerted  evolution 
within  a  family  appear  to  be  operating  on  the  Gbr-6700 
variant  family  as  well.  Further  studies  of  rRNA  sequence 
variants  such  as  these  that  are  present  as  multiple  copies 
in  the  genome  may  provide  information  on  the  mechanisms 
which  operate  in  the  evolution  of  rRNA  multigene  families. 
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